Introduction to weak Pinsker filtrations

Séverin Benzoni

Abstract

We introduce the so-called weak Pinsker dynamical filtrations, whose existence in any ergodic system follows from the universality of the weak Pinsker property, recently proved by Austin [1]. These dynamical filtrations appear as a potential tool to describe and classify positive entropy systems. We explore the links between the asymptotic structure of weak Pinsker filtrations and the properties of the underlying dynamical system. A central question is whether, on a given system, the structure of weak Pinsker filtrations is unique up to isomorphism. We give a partial answer, in the case where the underlying system is Bernoulli. We conclude our work by giving two explicit examples of weak Pinsker filtrations.

1 Introduction

In 1958, Kolmogorov and Sinaï introduced the notion of entropy in ergodic theory: the Kolmogorov-Sinaï entropy (or KS-entropy). The same year, Kolmogorov introduced another important notion: K-systems. He defined a K-system as a dynamical system $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ on which there is a generator $\xi$ whose tail $\sigma$ -algebra $\bigcap_{n\geq 1}\sigma(\xi_{]-\infty,-n]})$ is trivial. There is an equivalent definition that is more intrinsic to the system: $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ is a K-system if, and only if, every non-trivial observable $\xi_{0}$ satisfies $h_{\mu}(\xi,T)>0$ (a proof of this equivalence, and a more complete presentation of this notion can be found in [4]). It is also equivalent to assume that the Pinsker factor of the system is trivial, the Pinsker factor being the $\sigma$ -algebra

\Pi_{\mathbf{X}}=\{A\in\mathscr{A}\;|\;h(\mathbbm{1}_{A},T)=0\}.

The Pinsker factor is simply the largest factor of $\mathbf{X}$ that is of entropy $0$ . Therefore, a K-system has no non-trivial factor of entropy $0$ : it is entirely non-deterministic. For example, the most elementary K-systems are the Bernoulli shifts. They are K-systems because i.i.d. processes satisfy Kolmogorov’s $0$ - $1$ law.

Entropy is an invariant that quantifies the “chaos” of a dynamical system, or more precisely its unpredictability, and many of the questions that arose after its discovery were aimed at understanding the structure of this “chaos”. The first question, which Kolmogorov asked after proving that Bernoulli shifts are K-systems, was whether all K-systems are Bernoulli shifts, which would imply that these chaotic systems have a very simple structure. More general questions then emerged, and we will return to them in the following paragraphs.

The discovery of entropy first led to non-isomorphism results, particularly for Bernoulli shifts: two isomorphic Bernoulli shifts must have the same entropy. The converse of this result, shown by Ornstein [11, 12], is one of the most notable successes of the KS-entropy. But the ramifications of Ornstein’s theory go far beyond Bernoulli shifts, and have had a profound impact on the evolution of ergodic theory. We will confine ourselves here to telling the story of the weak Pinsker property.

In the early 1960s, Pinsker, then working in Moscow with Kolmogorov, showed that any K-factor of $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ is independent of the Pinsker factor $\Pi_{\mathbf{X}}$ (see [19], but this reference is in Russian). Following this result, although the existence of any specific K-factor had not yet been proved, he issued a conjecture (later called the “Pinsker conjecture”): any system of non-zero entropy is isomorphic to the direct product of its Pinsker factor and a K-system. A few years later, Sinai published [23] which seemed to confirm this conjecture: he proved the existence of a factor of $\mathbf{X}$ isomorphic to a Bernoulli shift of the same entropy as $\mathbf{X}$ . Given Pinsker’s independence result, it would have been sufficient to prove that the factor constructed by Sinaï and the Pinsker factor generate the entire $\sigma$ -algebra $\mathscr{A}$ to obtain a result even stronger than Pinsker’s conjecture: $\mathbf{X}$ would then be isomorphic to the direct product of its Pinsker factor and a Bernoulli shift. This “strong Pinsker conjecture” would also have proved that any K-system is isomorphic to a Bernoulli shift.

But this conjecture turned out to be false: Ornstein published a first example of a non-Bernoulli K-system [15] which contradicts the strong Pinsker conjecture. Following that, many other counterexamples were built, showing that the family of all K-systems is very broad, leaving little hope for a complete classification of those systems. Among all these counterexamples, we can find a construction by Ornstein [13] that can be used to contradict Pinsker’s conjecture. Furthermore, he then refined this result by constructing a mixing system that does not verify Pinsker’s conjecture [14]. Thus, all the conjectures formulated in the early years of the study of KS-entropy were wrong, revealing a wide variety of possible phenomena.

One of the ramifications of Ornstein’s theory can be found in the work of Thouvenot, who, starting in 1975, became interested in relatively Bernoulli systems and developed a “relative” version of Ornstein’s theory. Following this work, in his 1977 paper [24], he introduced the weak Pinsker property: for any $\varepsilon>0$ , $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ is isomorphic to the direct product of a Bernoulli shift $\mathbf{B}$ and a system $\mathbf{X}_{\varepsilon}$ of entropy at most $\varepsilon$ :

\mathbf{X}\cong\mathbf{X}_{\varepsilon}\otimes\mathbf{B}.

(1)

For four decades, however, it was unclear whether all systems verified the weak Pinsker property. But in 2018, Austin published a paper on the subject [1] in which he proved that all ergodic systems satisfy the weak Pinsker property.

We can then iterate this splitting operation: take $(\varepsilon_{n})_{n\leq-1}$ an increasing sequence of positive numbers such that $\lim_{n\rightarrow-\infty}\varepsilon_{n}=0$ , and start by splitting $\mathbf{X}$ into

\mathbf{X}\cong\mathbf{X}_{\varepsilon_{-1}}\otimes\mathbf{B}_{-1},

then split $\mathbf{X}_{\varepsilon_{-1}}$ into

\mathbf{X}_{\varepsilon_{-1}}\cong\mathbf{X}_{\varepsilon_{-2}}\otimes\mathbf{B}_{-2},

and so on. This yields a sequence of systems $(\mathbf{X}_{\varepsilon_{n}})_{n\leq-1}$ such that, for every $n\leq-1$ , $\mathbf{X}_{\varepsilon_{n}}$ is a factor of $\mathbf{X}_{\varepsilon_{n+1}}$ . By composing the factor maps, it means that each $\mathbf{X}_{\varepsilon_{n}}$ is a factor of $\mathbf{X}$ , and therefore generates a $T$ -invariant $\sigma$ -algebra $\mathscr{F}_{n}:=\sigma(\mathbf{X}_{\varepsilon_{n}})\subset\mathscr{A}$ . Because of our iterating construction, we see that $\mathscr{F}_{n}\subset\mathscr{F}_{n+1}$ , so the sequence $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ is a filtration. This is what we call a weak Pinsker filtration on $\mathbf{X}$ (see Definition 2.20).

The purpose of this paper is to discuss the links between weak Pinsker filtrations and the structure of dynamical systems with positive entropy. Weak Pinsker filtrations fall into the category of dynamical filtrations, i.e. filtrations $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ on a dynamical system for which each $\sigma$ -algebra $\mathscr{F}_{n}$ is $T$ -invariant. A framework for the study of such filtrations was introduced in [10], and this will guide our approach of weak Pinsker filtrations. In Section 2.2, we introduce the necessary concepts from the theory of dynamical filtrations. This framework is focused on the various possible structures of filtrations whose tail $\sigma$ -algebra $\bigcap_{n\leq 0}\mathscr{F}_{n}$ is trivial, which is the type of weak Pinsker filtrations that appear on K-systems (see Theorem 2.23). Therefore, the study of weak Pinsker filtrations we suggest would mainly be aimed at classifying K-systems, and in particular non-Bernoulli K-systems.

In Section 2, we give an overview of the results and open questions that arise from the study of the properties of weak Pinsker filtrations, and their relation to the structure of the underlying dynamical system. One of those questions concerns the uniqueness, up to isomorphism, of weak Pinsker filtrations. In Section 3, we give a partial answer to this uniqueness problem in the case of Bernoulli systems. That section is based on ideas suggested to us by Thouvenot. The main result of this paper is Theorem 3.1. Finally, in Section 4, we give explicit examples of weak Pinsker filtrations, in order to give a more concrete meaning to all of those abstract notions.

2 Weak Pinsker filtrations and related questions

In this section, we introduce the notion of weak Pinsker filtrations, the tools necessary to study them and state some of the main questions concerning those filtrations. Since weak Pinsker filtrations are dynamical filtrations, we will use the framework for classifying dynamical filtrations presented in the previous section.

2.1 Basic notation

A dynamical system is a quadruple $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ such that $(X,\mathscr{A},\mu)$ is a Lebesgue probability space, and $T$ is an invertible measure-preserving transformation.

Let $\mathscr{B},\mathscr{C}\subset\mathscr{A}$ be sub- $\sigma$ -algebras. We write $\mathscr{B}\subset\mathscr{C}$ mod $\mu$ , if for every $B\in\mathscr{B}$ , there exists $C\in\mathscr{C}$ such that $\mu(B\Delta C)=0$ . Then, $\mathscr{B}=\mathscr{C}$ mod $\mu$ if $\mathscr{B}\subset\mathscr{C}$ mod $\mu$ and $\mathscr{C}\subset\mathscr{B}$ mod $\mu$ . We denote $\mathscr{B}\vee\mathscr{C}$ the smallest $\sigma$ -algebra that contains $\mathscr{B}$ and $\mathscr{C}$ . We say that $\mathscr{B}$ is a factor $\sigma$ -algebra (or $T$ -invariant $\sigma$ -algebra) if $T^{-1}(\mathscr{B})=\mathscr{B}$ mod $\mu$ . Let $\mathscr{B},\mathscr{C}$ and $\mathscr{D}$ be sub- $\sigma$ -algebras of $\mathscr{A}$ . We say that $\mathscr{B}$ and $\mathscr{C}$ are relatively independent over $\mathscr{D}$ if for any $\mathscr{B}$ -measurable bounded function $B$ and $\mathscr{C}$ -measurable bounded function $C$

\mathbb{E}[BC\,|\,\mathscr{D}]=\mathbb{E}[B\,|\,\mathscr{D}]\;\mathbb{E}[C\,|\,\mathscr{D}]\;\text{almost surely}.

In this case, we write $\mathscr{B}\perp\!\!\!\perp_{\mathscr{D}}\mathscr{C}$ . If $\mathscr{D}$ is trivial, $\mathscr{B}$ and $\mathscr{C}$ are independent, which we denote $\mathscr{B}\perp\!\!\!\perp\mathscr{C}$ .

If we have two systems $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ and $\mathbf{Y}:=(Y,\mathscr{B},\nu,S)$ , a factor map is a measurable map $\pi:X\longrightarrow Y$ such that $\pi_{*}\mu=\nu$ and $\pi\circ T=S\circ\pi$ , $\mu$ -almost surely. If such a map exists, we say that $\mathbf{Y}$ is a factor of $\mathbf{X}$ and we denote $\sigma(\pi):=\pi^{-1}(\mathscr{B})$ the $\sigma$ -algebra generated by $\pi$ . Conversely, we also say that $\mathbf{X}$ is an extension of $\mathbf{Y}$ or that $\mathbf{Y}$ is embedded in $\mathbf{X}$ . Moreover, if there exist invariant sets $X_{0}\subset X$ and $Y_{0}\subset Y$ of full measure such that $\pi:X_{0}\longrightarrow Y_{0}$ is a bijection, then $\pi$ is an isomorphism and we write $\mathbf{X}\cong\mathbf{Y}$ .

For a given factor $\sigma$ -algebra $\mathscr{B}$ , in general, the quadruple $(X,\mathscr{B},\mu,T)$ is not a dynamical system since $\mathscr{B}$ need not separate points on $X$ , and in this case $(X,\mathscr{B},\mu)$ is not a Lebesgue probability space. However, there always exist a dynamical system $\mathbf{Y}$ and a factor map $\pi:\mathbf{X}\longrightarrow\mathbf{Y}$ such that $\sigma(\pi)=\mathscr{B}$ mod $\mu$ . Moreover, this representation is not unique, but for a given factor $\mathscr{B}$ , all such systems $\mathbf{Y}$ are isomorphic and there is a canonical construction to get a system $\mathbf{X}/\!\raisebox{-2.79857pt}{$\mathscr{B}$}$ and a factor map $p_{\mathscr{B}}:\mathbf{X}\longrightarrow\mathbf{X}/\!\raisebox{-2.79857pt}{$\mathscr{B}$}$ such that $\sigma(p_{\mathscr{B}})=\mathscr{B}$ mod $\mu$ (see [6, Chapter 2, Section 2]).

2.2 Dynamical filtrations

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ . A dynamical filtration is a pair $(\mathscr{F},T)$ such that $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ is a filtration in discrete negative time (i.e. $\mathscr{F}_{n}\subset\mathscr{F}_{n+1}$ ) on $\mathscr{A}$ and each $\mathscr{F}_{n}$ is $T$ -invariant. The theory of dynamical filtrations was initiated by Paul Lanthier in [9, 10]. For our present work, the primary notion we need is a precise definition of what it means for two dynamical filtrations to be isomorphic:

Definition 2.1.

Let $(\mathscr{F},T_{1})$ be a dynamical filtration on $\mathbf{X}_{1}:=(X_{1},\mathscr{A}_{1},\mu_{1},T_{1})$ and $(\mathscr{G},T_{2})$ a dynamical filtration on $\mathbf{X}_{2}:=(X_{2},\mathscr{A}_{2},\mu_{2},T_{2})$ . We say that $(\mathscr{F},T_{1})$ and $(\mathscr{G},T_{2})$ are isomorphic if there is an isomorphism $\Phi:\mathbf{X}_{1}/\!\raisebox{-2.79857pt}{$\mathscr{F}_{0}$}\rightarrow\mathbf{X}_{2}/\!\raisebox{-2.79857pt}{$\mathscr{G}_{0}$}$ such that, for all $n\leq 0$ , $\Phi(\mathscr{F}_{n})=\mathscr{G}_{n}$ mod $\mu_{2}$ .

We will discuss two specific families of filtrations:

Definition 2.2.

Let $(\mathscr{F},T)$ be a dynamical filtration on $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ . It is of product type if there is a sequence $(\mathscr{C}_{n})_{n\leq 0}$ of mutually independent factor $\sigma$ -algebras such that

\forall n\leq 0,\;\mathscr{F}_{n}=\bigvee_{k\leq n}\mathscr{C}_{k}\;\text{ mod }\,\mu.

Definition 2.3.

Let $(\mathscr{F},T)$ be a dynamical filtration on $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ . It is Kolmogorovian if its tail $\sigma$ -algebra is trivial, i.e. if $\bigcap_{n\leq 0}\mathscr{F}_{n}=\{\varnothing,X\}$ mod $\mu$ .

In particular, because of Kolmogorov’s $0-1$ law, product type filtrations are Kolmogorovian.

In the theory of dynamical filtrations, additional properties, such as standardness and I-cosiness, appear naturally (see [10]). However, for now, we are not able to find relevant applications of those notions in the study of weak Pinsker filtrations, so we dot not discuss them in this paper. That being said, standardness and I-cosiness being looser than “product-type”, a first step in investigating further the examples of Section 4 could be to determine whether they are standard, I-cosy or neither.

2.3 Reminders on KS-entropy

The notion of entropy first appeared in mathematics in 1948, introduced by Shannon in his foundational work on information theory [21]. It is defined as follows:

Definition 2.4 (Shannon entropy).

Let $(X,\mathscr{A},\mu)$ be a probability space and $\xi:X\rightarrow A$ a random variable, with $A$ finite or countable. The Shannon entropy of $\xi$ is

H_{\mu}(\xi):=-\sum_{a\in A}\mu(\{\xi=a\})\cdot\log\mu(\{\xi=a\}).

The number $H_{\mu}(\xi)$ gives the average amount of information given by the random variable $\xi$ . If we have a probability measure $\rho$ defined directly on $A$ , we can also define the entropy of that measure

H(\rho):=-\sum_{a\in A}\rho(a)\cdot\log\rho(a).

One can also define conditional entropy: for $\chi_{1}:X\rightarrow Y_{1}$ and $\chi_{2}:X\rightarrow Y_{2}$ be two random variables, we define

H_{\mu}(\chi_{1}\,|\,\chi_{2}):=\sum_{y_{2}\in Y_{2}}\mu(\chi_{2}=y_{2})\sum_{y_{1}\in Y_{1}}\varphi(\mu(\chi_{1}=y_{1}\,|\,\chi_{2}=y_{2})),

where $\varphi(x)=-x\cdot\log(x)$ . This quantifies the missing information required to determine $\chi_{1}$ once $\chi_{2}$ is known. We refer to [3, Chapter 2, Section 6] for the basic properties of this notion. In the present work, conditional entropy will only serve as a computational tool, via Fano’s inequality. See [3, Theorem 6.2] for a proof.

Lemma 2.5 (Fano’s inequality).

Let $\chi_{1}:X\rightarrow A$ and $\chi_{2}:X\rightarrow A$ be two $A$ -valued random variables. Set $d:=\mu(\chi_{1}\neq\chi_{2})$ . We have

H_{\mu}(\chi_{1}\,|\,\chi_{2})\leq\varphi(d)+\varphi(1-d)+d\log(\#A-1).

In particular, for $\varepsilon\in]0,e^{-1}[$ , if $\chi$ is an $A$ -valued random variable such that there exists $a_{0}\in A$ satisfying $\mu(\chi=a_{0})\geq 1-\varepsilon$ , then

H_{\mu}(\chi)\leq\varepsilon(1+\log(\varepsilon^{-1})+\log(\#A-1)).

In 1958, Kolmogorov and Sinaï used entropy to quantify the randomness of measure preserving dynamical systems.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system. To any random variable $\xi_{0}:X\rightarrow A$ , with $A$ finite, we associate $\xi:X\rightarrow A^{\mathbb{Z}}$ the corresponding $T$ -process

\xi:=(\xi_{n})_{n\in\mathbb{Z}}:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}}.

Also, for $F\subset\mathbb{Z}$ , set $\xi_{F}:=(\xi_{n})_{n\in F}$ .

The Kolmogorov-Sinaï entropy (or KS-entropy) of a dynamical system is:

Definition 2.6 (Kolmogorov-Sinaï entropy).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system. For a finite valued random variable $\xi_{0}:X\rightarrow A$ , define

h_{\mu}(\xi,T):=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\mu}(\xi_{\llbracket 0,n\rrbracket}).

For a $T$ -invariant $\sigma$ -algebra $\mathscr{B}\subset\mathscr{A}$ , define

h_{\mu}(\mathscr{B},T):=\sup\{h_{\mu}(\xi,T)\,;\;\xi_{0}\text{ a }\mathscr{B}\text{-measurable random variable}\}.

Finally, set

h(\mathbf{X}):=h_{\mu}(\mathscr{A},T).

The KS-entropy satisfies the following continuity result:

Lemma 2.7.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system and a random variable $\xi_{0}:X\rightarrow A$ , with $A$ finite. For $\varepsilon>0$ , there exists $\delta>0$ such that, for any random variable $\zeta_{0}:X\rightarrow A$ such that $\mu(\zeta_{0}\neq\xi_{0})\leq\delta$ , we have

|h_{\mu}(\xi,T)-h_{\mu}(\zeta,T)|\leq\varepsilon.

Proof.

In this proof, we will use Fano’s inequality (Lemma 2.5). Specifically, we compute:

	$\displaystyle h(\xi,T)$	$\displaystyle=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\mu}(\xi_{[0,n[})\leq\lim_{n\rightarrow\infty}\frac{1}{n}H_{\mu}((\xi\vee\zeta)_{[0,n[})$
		$\displaystyle\leq\lim_{n\rightarrow\infty}\frac{1}{n}\left(H_{\mu}(\zeta_{[0,n[})+\sum_{j=0}^{n-1}H_{\mu}(\xi_{j}\,\|\,\zeta_{[0,n[})\right)$
		$\displaystyle\leq\lim_{n\rightarrow\infty}\frac{1}{n}\left(H_{\mu}(\zeta_{[0,n[})+\sum_{j=0}^{n-1}H_{\mu}(\xi_{j}\,\|\,\zeta_{j})\right)$
		$\displaystyle\leq h_{\mu}(\zeta,T)+H_{\mu}(\xi_{0}\,\|\,\zeta_{0})$
		$\displaystyle\leq h_{\mu}(\zeta,T)+\varphi(d)+\varphi(1-d)+d\log(\#A-1).$

where $\varphi(x)=-x\cdot\log(x)$ . And, since $\varphi$ is continuous, there exists $\delta>0$ such that, if $d\leq\delta$ , we have

h_{\mu}(\xi,T)\leq h_{\mu}(\zeta,T)+\varepsilon.

By switching $\xi$ and $\zeta$ and doing the same reasoning, we end the proof. ∎

It is useful to locate the deterministic aspects of a dynamical system. We do that by considering the Pinsker factor of a system. For any factor $\sigma$ -algebra $\mathscr{B}$ , we define

\Pi_{\mathscr{B}}=\{A\in\mathscr{B}\;|\;h(\mathbbm{1}_{A},T)=0\}.

The Pinsker factor of the system $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ is then defined as $\Pi_{\mathbf{X}}:=\Pi_{\mathscr{A}}$ . We will use the following basic result, which can be found in [18, Theorem 14]:

Lemma 2.8.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system and $\mathscr{B}$ and $\mathscr{C}$ be independent factor $\sigma$ -algebras. We have

\Pi_{\mathscr{B}\vee\mathscr{C}}=\Pi_{\mathscr{B}}\vee\Pi_{\mathscr{C}}.

To be able to compute the entropy of a system, the following result proves to be most useful.

Theorem 2.9 (Kolmogorov-Sinaï).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system. Consider a finite valued random variable $\xi_{0}:X\rightarrow A$ and the corresponding $T$ -process $\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}}$ . Then we have

h_{\mu}(\sigma(\xi),T)=h_{\mu}(\xi,T).

In particular, if $\xi$ is a generator of $\mathscr{A}$ (i.e. $\mathscr{A}=\sigma(\xi)$ mod $\mu$ ), then $h(\mathbf{X})=h_{\mu}(\xi,T)$ .

2.4 Ornstein’s theory and its relative version

From the definition, one easily sees that KS-entropy is invariant under isomorphism of dynamical systems, which makes it a useful tool in the classification of measure preserving dynamical systems. The most remarkable classification results concern Bernoulli and relatively Bernoulli systems:

Definition 2.10 (Bernoulli and relatively Bernoulli).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system.

We say that $\mathbf{X}$ (or $\mathscr{A}$ ) is Bernoulli if there exists a random variable $\xi_{0}:X\rightarrow A$ such that the corresponding $T$ -process $\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}}$ is i.i.d. and generates $\mathscr{A}$ , i.e. we have $\sigma(\xi)=\mathscr{A}$ mod $\mu$ .

Let $\mathscr{B}\subset\mathscr{A}$ be a factor $\sigma$ -algebra. We say that $\mathbf{X}$ (or $\mathscr{A}$ ) is relatively Bernoulli over $\mathscr{B}$ if there is an i.i.d. process of the form $\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}}$ such that $\sigma(\xi)$ is independent of $\mathscr{B}$ and $\mathscr{A}=\mathscr{B}\vee\sigma(\xi)$ mod $\mu$ .

Those two definitions coincide when $\mathscr{B}$ is the trivial factor $\sigma$ -algebra: $\mathbf{X}$ is relatively Bernoulli over $\{\varnothing,X\}$ if and only if $\mathbf{X}$ is Bernoulli.

Remark 2.11.

We can consider another approach to define Bernoulli systems: take $A$ a finite or countable set and $\rho$ a probability measure on $A$ . On the product probability space $(A^{\mathbb{Z}},\rho^{\otimes\mathbb{Z}})$ , consider the transformation

S:(a_{n})_{n\in\mathbb{Z}}\mapsto(a_{n+1})_{n\in\mathbb{Z}}.

The map $S$ is called the shift on $A^{\mathbb{Z}}$ . One can easily check that $\rho^{\otimes\mathbb{Z}}$ is $S$ -invariant. Therefore, this yields a measure preserving dynamical system

\mathbf{B}:=(A^{\mathbb{Z}},\rho^{\otimes\mathbb{Z}},S),

(2)

which is called a Bernoulli shift. Then a system is Bernoulli if and only if it is isomorphic to a Bernoulli shift. Similarly, we can see that a system $\mathbf{X}$ is relatively Bernoulli over a factor $\sigma$ -algebra $\mathscr{B}$ if and only if $\mathbf{X}$ is isomorphic to a system of the form $\mathbf{Y}\otimes\mathbf{B}$ via a factor map $\varphi:\mathbf{X}\longrightarrow\mathbf{Y}\times\mathbf{B}$ such that $\sigma(\pi_{\mathbf{Y}}\circ\varphi)=\mathscr{B}$ mod $\mu$ (where $\pi_{\mathbf{Y}}$ is the projection of $\mathbf{Y}\otimes\mathbf{B}$ onto $\mathbf{Y}$ ).

Using Theorem 2.9, it is easy to compute the entropy of a Bernoulli system. Let $\xi$ be an $i.i.d.$ process on $\mathbf{X}$ that generates $\mathscr{A}$ . We then have

	$\displaystyle h(\mathbf{X})$	$\displaystyle=h_{\mu}(\mathscr{A},T)=h_{\mu}(\xi,T)=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\mu}(\xi_{\llbracket 0,n\rrbracket})$
		$\displaystyle=\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=0}^{n}H_{\mu}(\xi_{i})=H_{\mu}(\xi_{0}).$

In particular, if $\mathbf{X}$ is isomorphic to a system of the form (2), we get

h(\mathbf{X})=h(\mathbf{B})=-\sum_{a\in A}\rho(a)\log(\rho(a)).

Since, to be isomorphic, two systems need to have the same entropy, this computation enables us to get a non-isomorphism result between any two Bernoulli systems of different entropy. Remarkably, Ornstein proved that the converse is also true:

Theorem 2.12 (Ornstein [11, 12]).

If $\mathbf{X}$ and $\mathbf{Y}$ are Bernoulli systems such that $h(\mathbf{X})=h(\mathbf{Y})$ , then $\mathbf{X}\cong\mathbf{Y}$ .

This means that the KS-entropy gives a complete classification of Bernoulli systems. An outstanding result that emerged from Ornstein’s theory was a criterion to characterize Bernoulli systems: finite determination. However, although this notion is useful in proving abstract results, when studying a given system, it is not easy to know whether or not it is finitely determined. Because of that, another criterion called very weak Bernoullicity was developed (see [16, Section 7]). This is the criterion we are interested in.

For the remainder of this section, we assume that the processes are defined on finite alphabets. We first need a technical definition. Given a finite alphabet $A$ , an integer $\ell\geq 1$ and two words $\mathbf{a},\mathbf{b}\in A^{\ell}$ of length $\ell$ on $A$ , we define the normalized Hamming metric between $\mathbf{a}$ and $\mathbf{b}$ as:

d_{\ell}(\mathbf{a},\mathbf{b}):=\frac{1}{\ell}\#\{i\in\llbracket 1,\ell\rrbracket\,|\,a_{i}\neq b_{i}\},

where $\mathbf{a}=(a_{1},...,a_{\ell})$ and $\mathbf{b}=(b_{1},...,b_{\ell})$ . We then consider the corresponding transportation metric on $\mathscr{P}(A^{\ell})$ :

\forall\mu,\nu\in\mathscr{P}(A^{\ell}),\;\bar{d_{\ell}}(\mu,\nu):=\inf\left\{\int d_{\ell}(\mathbf{a},\mathbf{b})d\lambda(\mathbf{a},\mathbf{b})\,;\;\lambda\text{ a coupling of $\mu$ and $\nu$}\right\}.

Then a process $\xi$ is said to be very weak Bernoulli if, for some $\ell\geq 1$ , the conditional law of $\xi_{[0,\ell[}$ given the past of $\xi$ is close enough to the law of $\xi_{[0,\ell[}$ in the $\bar{d_{\ell}}$ metric. More formally, we state:

Definition 2.13 (Very weak Bernoulli).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic dynamical system, equipped with a $T$ -process $\xi$ taking values in a finite alphabet. We say that $\xi$ is very weak Bernoulli if, for every $\varepsilon>0$ , there exists $\ell\geq 1$ such that for every $m\geq 1$ , we have

\int\bar{d_{\ell}}\left(\nu_{\ell}(\cdot\,|\,\mathbf{a}_{[-m,0[}),\nu_{\ell}(\cdot)\right)d\nu(\mathbf{a})\leq\varepsilon,

where $\nu$ is the law of $\xi$ and, for $I\subset\mathbb{Z}$ , $\nu_{\ell}(\cdot\,|\,\mathbf{a}_{I})$ is the conditional law of $\xi_{[0,\ell[}$ given that $\xi_{I}$ equals $\mathbf{a}_{I}$ .

If $\mathscr{A}=\sigma(\xi)$ , we say that $\mathbf{X}$ (or $\mathscr{A}$ ) is very weak Bernoulli.

The fact that very weak Bernoullicity characterizes Bernoulli systems can be stated as follows:

Theorem 2.14 (see [16, 17]).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system. A $T$ -process $\xi$ on $\mathbf{X}$ is very weak Bernoulli if and only if $\sigma(\xi)$ is Bernoulli.

Following the work of Ornstein, Thouvenot studied relatively Bernoulli systems and adapted the definitions of finite determination and very weak Bernoullicity to get criteria that characterize relatively Bernoulli systems. Here we give his adaptation of very weak Bernoullicity:

Definition 2.15 (Relatively very weak Bernoulli).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic dynamical system, equipped with two $T$ -processes $\xi$ and $\eta$ with finite alphabets. We say that $\xi$ is relatively very weak Bernoulli over $\eta$ if, for every $\varepsilon>0$ , there exists $\ell\geq 1$ such that for every $m\geq 1$ and for all $k\geq 1$ large enough, we have

\int\bar{d_{\ell}}\left(\nu_{\ell}(\cdot\,|\,\mathbf{a}_{[-m,0[},\mathbf{b}_{[-k,k]}),\nu_{\ell}(\cdot\,|\,\mathbf{b}_{[-k,k]})\right)d\nu(\mathbf{a},\mathbf{b})\leq\varepsilon,

where $\nu$ is the law of $(\xi,\eta)$ and, for $I,J\subset\mathbb{Z}$ , $\nu_{\ell}(\cdot\,|\,\mathbf{a}_{I},\mathbf{b}_{J})$ is the conditional law of $\xi_{[0,\ell[}$ given that $\xi_{I}$ equals $\mathbf{a}_{I}$ and that $\eta_{J}$ equals $\mathbf{b}_{J}$ .

If $\mathscr{A}=\sigma(\xi)$ and $\mathscr{B}=\sigma(\eta)$ , we say that $\mathbf{X}$ (or $\mathscr{A}$ ) is relatively very weak Bernoulli over $\mathscr{B}$ .

Many early results from Thouvenot’s theory were stated for relatively finitely determined systems. However, for our work, relative very weak Bernoullicity is a more convenient notion. Fortunately, we have the following equivalence, which enables us to apply to relatively very weak Bernoulli processes results originally stated for relatively finitely determined processes:

Theorem 2.16 (see [20]).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic system and $\xi$ and $\eta$ be $T$ -processes with finite alphabets defined on $\mathbf{X}$ . Then $\xi$ is relatively very weak Bernoulli over $\eta$ if and only if it is relatively finitely determined over $\eta$ .

We give a summary of the results we will use:

Lemma 2.17.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a finite entropy dynamical system and $\mathscr{B}$ a factor $\sigma$ -algebra. Let $\xi$ and $\eta$ be $T$ -processes with finite alphabets defined on $\mathbf{X}$ such that $\mathscr{A}=\sigma(\xi)$ and $\mathscr{B}=\sigma(\eta)$ . If $\xi$ is relatively very weak Bernoulli over $\eta$ , then

(i)

$\mathbf{X}$ is relatively Bernoulli over $\mathscr{B}$ ,
(ii)

any $T$ -process $\rho$ on $\mathbf{X}$ is relatively very weak Bernoulli over $\eta$ ,
(iii)

for any factor $\sigma$ -algebra $\mathscr{C}\subset\mathscr{A}$ , $\mathscr{B}\vee\mathscr{C}$ is relatively very weak Bernoulli over $\mathscr{B}$ ,
(iv)

any factor $\sigma$ -algebra $\mathscr{C}\subset\mathscr{A}$ that is independent from $\mathscr{B}$ is Bernoulli.

Proof.

We prove the lemma mainly by referring to the literature. The statement (i) follows from [26, Proposition 5] and Theorem 2.16. Then (ii) follows from [26, Proposition 4] and Theorem 2.16, and (iii) follows from (ii). Let us prove (iv): take $\rho$ a process on $\mathbf{X}$ such that $\mathscr{C}=\sigma(\rho)$ mod $\mu$ . From (ii), we know that $\rho$ is relatively very weak Bernoulli over $\eta$ . However, since $\mathscr{C}$ is independent of $\mathscr{B}$ , $\rho$ is independent of $\eta$ . One can then notice that if we add this independence in the definition of relative very weak Bernoullicity, we end up with the fact that $\rho$ is very weak Bernoulli. Finally, Theorem 2.14 tells us that $\mathscr{C}=\sigma(\rho)$ is Bernoulli. ∎

We have just given many definitions and results concerning processes with finite alphabets, and the $\sigma$ -algebras they generate. The following result from Krieger tells that it is applicable on any finite entropy system:

Theorem 2.18 (See [8]).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic dynamical system and $\mathscr{B}\subset\mathscr{A}$ be a factor $\sigma$ -algebra. If $h_{\mu}(\mathscr{B},T)<\infty$ , there exists a finite alphabet $A$ and a random variable $\xi_{0}:X\rightarrow A$ such that

\mathscr{B}=\sigma(\{\xi_{0}\circ T^{n}\}_{n\in\mathbb{Z}})\text{ mod }\mu.

We say that $\xi$ is a finite generator of $\mathscr{B}$ .

2.5 Positive entropy systems and weak Pinsker filtrations

In 2018, Austin proved the following:

Theorem 2.19 (Austin, 2018, [1]).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic dynamical system. For every $\varepsilon>0$ there exists a factor $\sigma$ -algebra $\mathscr{B}$ such that:

•

$h_{\mu}(\mathscr{B},T)\leq\varepsilon$ ,
•

$\mathbf{X}$ is relatively Bernoulli over $\mathscr{B}$ .

In other words, $\mathbf{X}$ has the weak Pinsker property (as in (1)).

Definition 2.20.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system and $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ a dynamical filtration on $\mathbf{X}$ such that $\mathscr{F}_{0}=\mathscr{A}$ . We say that $\mathscr{F}$ is a weak Pinsker filtration if

•

for every $n\leq-1$ , $\mathscr{F}_{n+1}$ is relatively Bernoulli over $\mathscr{F}_{n}$ ,
•

and

$\lim_{n\rightarrow-\infty}h_{\mu}(\mathscr{F}_{n},T)=0.$

Then, by iterating Austin’s theorem, we see that we can obtain weak Pinsker filtrations on any ergodic system:

Proposition 2.21.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system. If $\mathbf{X}$ is ergodic, there exists a weak Pinsker filtration on $\mathbf{X}$ . More specifically, for every increasing sequence $(\varepsilon_{n})_{n\leq-1}$ such that $\varepsilon_{-1}\leq h(\mathbf{X})$ and $\lim_{n\rightarrow-\infty}\varepsilon_{n}=0$ , there exists a weak Pinsker filtration $(\mathscr{F}_{n})_{n\leq 0}$ such that $\forall n\leq-1$ , $h_{\mu}(\mathscr{F}_{n},T)=\varepsilon_{n}$ .

This simply tells us that weak Pinsker filtrations exist, but gives no explicit description. To start understanding those filtrations better, we can first link them to the Pinsker factor of the system:

Proposition 2.22.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system and $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ a weak Pinsker filtration on $\mathbf{X}$ . Then the tail $\sigma$ -algebra $\mathscr{F}_{-\infty}:=\bigcap_{n\leq 0}\mathscr{F}_{n}$ is the Pinsker factor of $\mathbf{X}$ .

Proof.

Let $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ be a weak Pinsker filtration on $\mathbf{X}$ . Since, for $n_{0}\leq 0$ , $\mathscr{F}_{-\infty}\subset\mathscr{F}_{n_{0}}$ , it follows that $h_{\mu}(\mathscr{F}_{-\infty},T)\leq h_{\mu}(\mathscr{F}_{n_{0}},T)$ . Then, by taking $n_{0}\rightarrow-\infty$ , this yields $h_{\mu}(\mathscr{F}_{-\infty},T)=0$ . Therefore, $\mathscr{F}_{-\infty}\subset\Pi_{\mathbf{X}}$ .

Conversely, let us show that, for every $n\leq 0$ , $\Pi_{\mathbf{X}}\subset\mathscr{F}_{n}$ . Since $\mathscr{F}$ is a weak Pinsker filtration, we can choose $\mathscr{B}_{n}\subset\mathscr{A}$ a Bernoulli factor $\sigma$ -algebra such that

\mathscr{F}_{n}\perp\!\!\!\perp\mathscr{B}_{n}\;\text{ and }\;\mathscr{F}_{n}\vee\mathscr{B}_{n}=\mathscr{A}\text{ mod }\mu.

Then we use Lemma 2.8:

\Pi_{\mathbf{X}}=\Pi_{\mathscr{A}}=\Pi_{\mathscr{F}_{n}}\vee\Pi_{\mathscr{B}_{n}}=\Pi_{\mathscr{F}_{n}}\subset\mathscr{F}_{n},

because, $\mathscr{B}_{n}$ being Bernoulli, its Pinsker factor is trivial. ∎

Weak Pinsker filtrations are dynamical filtrations, and in Section 2.2, we introduced tools to classify dynamical filtrations, which we use here. While trying to connect the properties of a weak Pinsker filtration with the properties of the underlying system, we get the following simple results:

Theorem 2.23.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a dynamical system and $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ be a weak Pinsker filtration on $\mathbf{X}$ . Then

(i)

$\mathbf{X}$ is a K-system if and only if $\mathscr{F}$ is Kolmogorovian, i.e. $\bigcap_{n\leq 0}\mathscr{F}_{n}=\{\varnothing,X\}$ mod $\mu$ .
(ii)

If the filtration $\mathscr{F}$ is of product-type, then $\mathbf{X}$ is Bernoulli.

Proof.

We know that a system is K if and only if its Pinsker factor is trivial. Then the equivalence in (i) follows from Proposition 2.22.

We now prove (ii). Assume that $\mathscr{F}$ is a weak Pinsker filtration of product type. This means that there exists a sequence $(\mathscr{B}_{n})_{n\leq 0}$ of mutually independent factor $\sigma$ -algebras such that $\mathscr{F}_{n}=\bigvee_{k\leq n}\mathscr{B}_{k}$ . Let $n\leq 0$ . We know that $\mathscr{F}_{n}$ is relatively Bernoulli over $\mathscr{F}_{n-1}$ and that $\mathscr{B}_{n}$ is independent of $\mathscr{F}_{n-1}$ . So, Lemma 2.17 tells us that $\mathscr{B}_{n}$ is Bernoulli. Therefore, we have $\mathscr{A}=\mathscr{F}_{0}=\bigvee_{k\leq 0}\mathscr{B}_{k}$ , which shows that we can write $\mathscr{A}$ as a product of mutually independent Bernoulli factors. Hence, $\mathscr{A}$ is Bernoulli. ∎

However, this result leaves many open questions. First, we can ask if the converse of (ii) is true. Since we remark at the end of Section 2.6 that, on a Bernoulli shift, there is at least one weak Pinsker filtration of product type, the converse of (ii) is equivalent to the uniqueness problem given in Question 2.27. Another area that is left open is to consider other properties from the theory of dynamical filtrations, like standardness or I-cosiness, and wonder what it implies of the system if a weak Pinsker filtrations has those properties:

Question 2.24.

What can we say about $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ if there is a weak Pinsker filtration $\mathscr{F}$ on $\mathbf{X}$ that is standard ? In that case, is $\mathbf{X}$ Bernoulli ? And if the weak Pinsker filtration is I-cosy ?

Our hope is that answering those questions could give additional information on the structure of non-Bernoulli K-systems. For precise definitions of standardness and I-cosiness, see [10] or [2].

2.6 The uniqueness problem

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic dynamical system. As mentioned in Proposition 2.21, the fact that every ergodic systems satisfies the weak Pinsker property implies that, for any given increasing sequence $(\varepsilon_{n})_{n\leq-1}$ that goes to $0$ in $-\infty$ such that $\varepsilon_{-1}\leq h(\mathbf{X})$ , there exits a weak Pinsker filtration $\mathscr{F}$ on $\mathbf{X}$ such that $h_{\mu}(\mathscr{F}_{n},T)=\varepsilon_{n}$ . But this filtration is not unique. Indeed, in the splitting result given by the weak Pinsker property (1), the choice of the factor $\sigma$ -algebra generated by $\mathbf{X}_{\varepsilon}$ is not unique. For example, take a system of the form

\mathbf{X}:=\mathbf{Z}\otimes\mathbf{B}_{1}\otimes\mathbf{B}_{2},

where $\mathbf{Z}$ is a $0$ entropy system and $\mathbf{B}_{1}$ and $\mathbf{B}_{2}$ are Bernoulli shifts of equal entropy. Note that $\mathbf{Z}\otimes\mathbf{B}_{1}$ and $\mathbf{Z}\otimes\mathbf{B}_{2}$ generate two different factor $\sigma$ -algebras on $\mathbf{X}$ . But they are both factors over which $\mathbf{X}$ is relatively Bernoulli, and they have the same entropy. However, we can notice in this example that $\mathbf{Z}\otimes\mathbf{B}_{1}$ and $\mathbf{Z}\otimes\mathbf{B}_{2}$ are isomorphic. This observation hints to a general result:

Theorem 2.25 (From Thouvenot in [25]).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ and $\mathbf{Y}:=(Y,\mathscr{B},\nu,S)$ be ergodic dynamical systems and $\mathbf{B}$ be a Bernoulli shift of finite entropy. If $\mathbf{X}\otimes\mathbf{B}$ and $\mathbf{Y}\otimes\mathbf{B}$ are isomorphic, then $\mathbf{X}$ and $\mathbf{Y}$ are isomorphic.

Proof.

This proof relies on the weak Pinsker property of $\mathbf{X}$ and $\mathbf{Y}$ , and Lemma 2.17. We also use many times that Bernoulli shifts with the same entropy are isomorphic.

Since $\mathbf{X}\otimes\mathbf{B}$ and $\mathbf{Y}\otimes\mathbf{B}$ are isomorphic, we have:

\displaystyle h(\mathbf{X})=h(\mathbf{X}\otimes\mathbf{B})-h(\mathbf{B})=h(\mathbf{Y}\otimes\mathbf{B})-h(\mathbf{B})=h(\mathbf{Y}).

Set $a:=h(\mathbf{X})=h(\mathbf{Y})$ . We can then apply the weak Pinsker property of $\mathbf{X}$ and $\mathbf{Y}$ to find two systems $\hat{\mathbf{X}}$ , $\hat{\mathbf{Y}}$ and a Bernoulli shift $\hat{\mathbf{B}}$ such that

h(\hat{\mathbf{X}})=h(\hat{\mathbf{Y}})\leq a/3,

and

\mathbf{X}\cong\hat{\mathbf{X}}\otimes\hat{\mathbf{B}}\;\text{ and }\;\mathbf{Y}\cong\hat{\mathbf{Y}}\otimes\hat{\mathbf{B}}.

This implies

\hat{\mathbf{X}}\otimes(\hat{\mathbf{B}}\otimes\mathbf{B})\cong\hat{\mathbf{Y}}\otimes(\hat{\mathbf{B}}\otimes\mathbf{B}).

In other words, there is a system $\mathbf{Z}$ and two factor maps $p_{\hat{\mathbf{X}}}:\mathbf{Z}\longrightarrow\hat{\mathbf{X}}$ and $p_{\hat{\mathbf{Y}}}:\mathbf{Z}\longrightarrow\hat{\mathbf{Y}}$ such that $\mathbf{Z}$ is relatively Bernoulli over $p_{\hat{\mathbf{X}}}$ and relatively Bernoulli over $p_{\hat{\mathbf{Y}}}$ . But then, Lemma 2.17 tells us that the factor $\sigma$ -algebra $\sigma(p_{\hat{\mathbf{X}}}\vee p_{\hat{\mathbf{Y}}})$ is relatively very weak Bernoulli over $p_{\hat{\mathbf{X}}}$ and relatively very weak Bernoulli over $p_{\hat{\mathbf{Y}}}$ . Therefore, there exist a Bernoulli shift $\tilde{\mathbf{B}}$ and two factor maps $\varphi_{1}:\mathbf{Z}\longrightarrow\tilde{\mathbf{B}}$ and $\varphi_{2}:\mathbf{Z}\longrightarrow\tilde{\mathbf{B}}$ such that $\varphi_{1}\perp\!\!\!\perp p_{\hat{\mathbf{X}}}$ , $\varphi_{2}\perp\!\!\!\perp p_{\hat{\mathbf{Y}}}$ and

\sigma(p_{\hat{\mathbf{X}}}\vee\varphi_{1})=\sigma(p_{\hat{\mathbf{X}}}\vee p_{\hat{\mathbf{Y}}})=\sigma(p_{\hat{\mathbf{Y}}}\vee\varphi_{2}).

This implies that

\hat{\mathbf{X}}\otimes\tilde{\mathbf{B}}\cong\hat{\mathbf{Y}}\otimes\tilde{\mathbf{B}}.

But, since we chose to have $h(\hat{\mathbf{X}})=h(\hat{\mathbf{Y}})\leq a/3$ , we get

\displaystyle h(\tilde{\mathbf{B}})\leq h(p_{\hat{\mathbf{X}}}\vee p_{\hat{\mathbf{Y}}})\leq h(\hat{\mathbf{X}})+h(\hat{\mathbf{Y}})\leq 2a/3\leq h(\hat{\mathbf{B}}).

Given a last Bernoulli shift $\overline{\mathbf{B}}$ of entropy $h(\hat{\mathbf{B}})-h(\tilde{\mathbf{B}})$ we get $\hat{\mathbf{B}}\cong\tilde{\mathbf{B}}\otimes\overline{\mathbf{B}}$ and

\displaystyle\mathbf{X}\cong\hat{\mathbf{X}}\otimes\hat{\mathbf{B}}\cong\hat{\mathbf{X}}\otimes\tilde{\mathbf{B}}\otimes\overline{\mathbf{B}}\cong\hat{\mathbf{Y}}\otimes\tilde{\mathbf{B}}\otimes\overline{\mathbf{B}}\cong\hat{\mathbf{Y}}\otimes\hat{\mathbf{B}}\cong\mathbf{Y}.

∎

As a consequence of this result, we see that if $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ and $\mathscr{G}:=(\mathscr{G}_{n})_{n\leq 0}$ are two weak Pinsker filtrations on $\mathbf{X}$ such that, for all $n\leq 0$ , $h_{\mu}(\mathscr{F}_{n},T)=h_{\mu}(\mathscr{G}_{n},T)$ , then we must have that, for each $n\leq 0$ , $\mathbf{X}/\!\raisebox{-2.79857pt}{$\mathscr{F}_{n}$}\cong\mathbf{X}/\!\raisebox{-2.79857pt}{$\mathscr{G}_{n}$}$ .

However, this only gives “local isomorphisms”, and it does not necessarily mean that the filtrations $\mathscr{F}$ and $\mathscr{G}$ are isomorphic (according to the notion of isomorphism introduced in Definition 2.1). Therefore, the following is still an open question:

Question 2.26.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic dynamical system. Are all weak Pinsker filtrations on $\mathbf{X}$ with the same entropy isomorphic ?

This question is what we call the uniqueness problem.

If $\mathbf{X}$ is a Bernoulli shift, and if we take an increasing sequence $(\varepsilon_{n})_{n\leq 0}$ such that $\varepsilon_{0}=h(\mathbf{X})$ , we can take Bernoulli shifts $(\mathbf{B}_{n})_{n\leq 0}$ such that $h(\mathbf{B}_{n})=\varepsilon_{n}-\varepsilon_{n-1}$ , and define the system

\mathbf{B}:=\bigotimes_{n\leq 0}\mathbf{B}_{n}.

It is a Bernoulli shift of entropy $\varepsilon_{0}=h(\mathbf{X})$ , so it is isomorphic to $\mathbf{X}$ . Through this isomorphism, the factors of the form $\bigotimes_{k\leq n}\mathbf{B}_{k}$ generate a product type weak Pinsker filtration on $\mathbf{X}$ . Therefore, in the case where $\mathbf{X}$ is a Bernoulli shift, the uniqueness problem becomes:

Question 2.27.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a Bernoulli shift. Are all weak Pinsker filtrations on $\mathbf{X}$ of product type ?

3 Uniqueness problem on Bernoulli systems

In this section, we present our efforts to tackle Question 2.27. The ideas developed here come from discussions with Jean-Paul Thouvenot, and we thank him for those insights. Specifically, we are going to show:

Theorem 3.1.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a Bernoulli system and let $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ be a weak Pinsker filtration. There exists some sub-sequence $(\mathscr{F}_{n_{k}})_{k\leq 0}$ which is a weak Pinsker filtration of product type.

The fact that we are only able to describe the structure of a sub-sequence of $\mathscr{F}$ , for now, seems to be significant. Indeed, we can compare that result to a well known result from Vershik about static filtrations on a probability space: any filtration whose tail $\sigma$ -algebra $\bigcap_{n\leq 0}\mathscr{F}_{n}$ is trivial has a sub-sequence that is standard (see [5, Theorem 3]). However there are many examples of non-standard filtrations with trivial tail $\sigma$ -algebra. Therefore, although the context of Vershik’s result is very different, it emphasizes that Theorem 3.1 does not give a complete answer to Question 2.27.

The main step in proving Theorem 3.1 is contained in the following proposition:

Proposition 3.2.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a Bernoulli system of finite entropy and $\mathcal{P}_{0}:X\rightarrow A$ a finite generator of $\mathscr{A}$ , i.e. a finite valued random variable such that $\mathscr{A}=\sigma(\{\mathcal{P}_{0}\circ T^{n}\}_{n\in\mathbb{Z}})$ . For every $\varepsilon>0$ , there exists $\delta>0$ such that, if $\mathscr{H}\subset\mathscr{A}$ is a factor $\sigma$ -algebra such that $\mathbf{X}$ is relatively Bernoulli over $\mathscr{H}$ , and if $h_{\mu}(\mathscr{H},T)\leq\delta$ , there is a Bernoulli factor $\sigma$ -algebra $\mathscr{B}$ such that

(i)

$\mathscr{B}\perp\!\!\!\perp\mathscr{H}$ ,
(ii)

$\mathscr{A}=\mathscr{H}\vee\mathscr{B}$ mod $\mu$ ,
(iii)

and $\mathcal{P}_{0}\preceq_{\varepsilon}\mathscr{B}$ .

In this proposition, Krieger’ theorem (Theorem 2.18) ensures the existence of a finite generator $\mathcal{P}$ since $\mathbf{X}$ has finite entropy. The notation “ $\mathcal{P}_{0}\preceq_{\varepsilon}\mathscr{B}$ ”, which we use many times below, means that there exists a $\mathscr{B}$ -measurable random variable $\mathcal{Q}_{0}$ such that $\mu(\mathcal{P}_{0}\neq\mathcal{Q}_{0})\leq\varepsilon$ .

The existence of a Bernoulli factor satisfying (i) and (ii) is simply the definition of relative Bernoullicity, the important part of this proposition is the ability to build a Bernoulli complement that satisfies (iii). Then iterating this result will yield Theorem 3.1 (see Section 3.3).

3.1 The technical lemma

In this section, we tackle the main technical and constructive part of the proof of Proposition 3.2. It is contained in Lemma 3.7.

In Section 2.4, we introduced the notion of very weak Bernoullicity, which gives a characterization of Bernoulli systems. Here, we use another equivalent notion: extremality, due to Thouvenot (see [27, Definition 6.3]).

Definition 3.3.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic dynamical system and $\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}}$ be a process where $\xi_{0}$ takes values in some finite alphabet $A$ . We say that $\xi$ is extremal if, for every $\varepsilon>0$ , there exist $\delta>0$ and $N\in\mathbb{N}$ , such that for every $\ell\geq N$ and every random variable $\mathcal{Q}:X\rightarrow B$ with $\#B\leq 2^{\delta\ell}$ , there is a set $B_{0}\subset B$ such that $\mu(\mathcal{Q}\in B_{0})\geq 1-\varepsilon$ and for $b\in B_{0}$ , we have:

\bar{d_{\ell}}(\nu_{\ell}(\cdot\,|\,b),\nu_{\ell}(\cdot))\leq\varepsilon,

where $\nu_{\ell}$ is the law of $\xi_{[0,\ell[}$ and $\nu_{\ell}(\cdot\,|\,b)$ is the law of $\xi_{[0,\ell[}$ given that $\mathcal{Q}$ equals $b$ .

In [27, Theorem 6.4], it is shown that extremality is equivalent to very weak Bernoullicity (and hence to Bernoullicity). In particular, we will use the fact that any process defined on a Bernoulli system is extremal.

The proof of Lemma 3.7 uses many methods that are usual in Ornstein’s theory of Bernoulli shifts (a presentation can be found in [16] or [22]). Therefore, we need to introduce some commonly used notions and results from that theory. The following combinatorial result is frequently used in Ornstein’s theory:

Lemma 3.4 (Hall’s marriage lemma [7]).

Let $E$ and $F$ be finite sets, and $\{J_{e}\}_{e\in E}$ be a family of subsets of $F$ : $\forall e\in E,J_{e}\subset F$ . There exists an injective map $\psi:E\rightarrow F$ such that $\forall e\in E,\psi(e)\in J_{e}$ if, and only if for every $I\subset E$ , we have

\#I\leq\#\bigcup_{e\in I}J_{e}.

The main way in which the entropy of the processes is used in our arguments comes from the Shannon-McMillan-Breiman Theorem (see [3, Theorem 13.1]):

Theorem 3.5.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic dynamical system and $\xi_{0}:X\rightarrow A$ . For $\boldsymbol{a}\in A^{[0,n[}$ , define

p_{n}(\boldsymbol{a}):=\mu(\xi_{[0,n[}=\boldsymbol{a}).

We have

\lim_{n\rightarrow\infty}-\frac{1}{n}\log(p_{n}(\xi_{[0,n[}))=h_{\mu}(\xi,T),\;\mu\text{-almost surely}.

In particular, we also have the convergence in probability: for every $\varepsilon>0$ , there exists $N\geq 1$ such that for every $n\geq N$ , there exists a set $\mathcal{A}_{n}\subset A^{[0,n[}$ such that $\mu(\xi_{[0,n[}\in\mathcal{A}_{n})\geq 1-\varepsilon$ and for every $\boldsymbol{a}\in\mathcal{A}_{n}$ ,

2^{-(h_{\mu}(\xi,T)+\varepsilon)n}\leq\mu(\xi_{[0,n[}=\boldsymbol{a})\leq 2^{-(h_{\mu}(\xi,T)-\varepsilon)n}.

We also need to introduce another tool that is commonly used in Ornstein’s theory: Rokhlin towers. On a dynamical system $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ , to get a tower of height $n$ , we need a set $F$ such that the sets $T^{j}F$ , for $0\leq j\leq n-1$ are disjoint. Then the family $\mathcal{T}:=(F,TF,...,T^{n-1}F)$ is what we call a Rokhlin tower, or, in short, a tower. However, we will also refer to the set $\bigsqcup_{j=0}^{n-1}T^{j}F$ as a tower. In particular, many times, we will write $\mu(\mathcal{T})$ for $\mu(\bigsqcup_{j=0}^{n-1}T^{j}F)$ . The following result guaranties that Rokhlin of arbitrary height and total measure almost $1$ exist under quite general conditions:

Proposition 3.6 (See [22]).

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic dynamical system and $\xi_{0}$ a finite valued random variable. Assume that $\mu$ is non-atomic. For all $n\geq 1$ and $\varepsilon>0$ , there exists a measurable set $F\subset X$ such that the sets $T^{j}F$ , for $j\in[0,n[$ , are disjoint, $\mu(\bigcup_{j=0}^{n-1}T^{j}F)\geq 1-\varepsilon$ and $\mathcal{L}(\xi_{0}\,|\,F)=\mathcal{L}(\xi_{0})$ .

The set $F$ is called the base of the tower $\mathcal{T}$ and the sets $T^{j}F$ are the levels. For any set $E\subset F$ , the family

C_{E}:=\{T^{j}E\}_{0\leq j\leq n-1}

is a tower, and we say that it is a column of $\mathcal{T}$ . If $\xi_{0}:X\rightarrow A$ is a random variable, we will be interested in the columns defined by sets of the form $F_{\boldsymbol{a}}:=F\cap\{\xi_{[0,n[}=\boldsymbol{a}\}$ with $\boldsymbol{a}\in A^{[0,n[}$ . We say that $\boldsymbol{a}$ is the $\xi$ -name of the column $C_{\boldsymbol{a}}:=C_{F_{\boldsymbol{a}}}$ . The columns $\{C_{\boldsymbol{a}}\}_{\boldsymbol{a}\in A^{[0,n[}}$ give a partition of the levels of $\mathcal{T}$ . Now, conversely, assume that we have a partition of $F$ given by sets $E_{1},...,E_{p}$ , then the columns $C_{E_{1}},...,C_{E_{p}}$ give a partition of the levels of $\mathcal{T}$ . If, moreover, we associate to each column $C_{E_{i}}$ a name $\boldsymbol{a}^{(i)}\in A^{[0,n[}$ of length $n$ , we can define a random variable $\xi_{0}$ on the levels of $\mathcal{T}$ so that, for every $i$ , we have $C_{E_{i}}=C_{\boldsymbol{a}^{(i)}}$ . We obtain this random variable simply by setting, for $i\in\llbracket 1,p\rrbracket,j\in\llbracket 0,n\llbracket$

\xi_{0}=\boldsymbol{a}_{j}^{(i)}\;\text{ on }\;T^{j}E_{i}.

This is the framework we will use to construct our random variables. We are now ready to turn our attention to the following:

Lemma 3.7.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a Bernoulli system of finite entropy and $\mathcal{P}_{0}:X\rightarrow A$ a finite generator of $\mathscr{A}$ . For every $\varepsilon>0$ , there exists $\delta>0$ satisfying the following:

•

if $\mathcal{H}_{0}:X\rightarrow H$ is a finite valued random variable such that $h_{\mu}(\mathcal{H},T)\leq\delta$ ,
•

and $\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}}$ is a $B$ -valued (for some finite set $B$ ) i.i.d. process independent from $\mathcal{H}$ such that $\mathscr{A}=\sigma(\mathcal{H})\vee\sigma(\xi)$ mod $\mu$ ,

then for any $\alpha>0$ , there exists a process $\tilde{\xi}$ such that

(i)

$\bar{d}_{1}(\mathcal{L}(\xi_{0}),\mathcal{L}(\tilde{\xi}_{0}))\leq\alpha$ ,
(ii)

$0\leq h_{\mu}(\mathcal{H}\vee\xi,T)-h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)\leq\alpha$ ,
(iii)

and $\mathcal{P}_{0}\preceq_{\varepsilon}\sigma(\tilde{\xi})$ .

The proof of the lemma being quite intricate, we start by giving a sketch of the proof. First, we will need a Rokhlin tower $\mathcal{T}_{n}$ of very large height $n$ . This tower is then divided into the columns $C_{\boldsymbol{h}}$ (see (10)) generated by $\mathcal{H}$ . Each of those columns is then divided into sub-columns $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ (see (14)) generated by $\xi$ . Because $\mathcal{H}\vee\xi$ generates $\mathscr{A}$ , we can approach $\mathcal{P}_{0}$ by some random variable $\tilde{\mathcal{P}}_{0}$ depending on finitely many coordinates of $\mathcal{H}\vee\xi$ . It enables us to associate to each $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ a word $\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h},\boldsymbol{b})$ which gives a good approximation of $\mathcal{P}_{0}$ on the levels of $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ . We will define $\tilde{\xi}_{0}$ by giving $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ a new $\tilde{\xi}$ -name, to replace $\boldsymbol{b}$ . Our goal is to choose those names so that we can get a good approximation of $\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h},\boldsymbol{b})$ by simply knowing the $\tilde{\xi}$ -name of $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ , regardless of $\boldsymbol{h}$ . To do that, we fix a column $C_{\boldsymbol{h}_{0}}$ and use it as a “model” for the other columns. Then the extremality of $\mathcal{P}$ comes into play: it tells us, for most choices of $\boldsymbol{h}$ , the families $\{\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h}_{0},\boldsymbol{b})\}_{\boldsymbol{b}\in\mathcal{B}_{n}}$ and $\{\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h},\boldsymbol{b})\}_{\boldsymbol{b}\in\mathcal{B}_{n}}$ are quite similar. More specifically, we show that, for most $\boldsymbol{b}$ , there are names $\tilde{\boldsymbol{b}}$ such that ${d_{n}(\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h}_{0},\tilde{\boldsymbol{b}}),\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h},\boldsymbol{b}))}$ is small. Those names are then suitable $\tilde{\xi}$ -names for $C^{\boldsymbol{b}}_{\boldsymbol{h}}$ . However, when we choose among those suitable names, we need to make sure that we are not giving the same name to too many columns, otherwise we might loose to much information, and we could not get (ii). This is done using Hall’s marriage lemma.

Proof of Lemma 3.7.

In this proof, we use many parameters, which we introduce below in a specific order to highlight the way they depend on each other:

(a)

Let $\varepsilon>0$ . This parameter is chosen first, as it appears in the statement of the lemma. Then we choose $\delta>0$ and $N\geq 1$ , as the numbers associated to $\varepsilon^{3}/4$ in the definition of extremality of $\mathcal{P}$ . We assume that $h_{\mu}(\mathcal{H},T)<\delta$ .
(b)

Let $\alpha>0$ . This is another arbitrarily small parameter that appears in the statement of the lemma. It does not depend on $\varepsilon$ nor $\delta$ .
(c)

Next, we introduce $0<\gamma<1$ , which must be small relative to $\alpha$ and $\varepsilon$ for (ii) and (iii) to hold. Specifically, we require that $\gamma\leq\varepsilon/2$ , and that the bound in Lemma 2.7 holds with error $\alpha$ , whenever $\mu(\xi_{0}\neq\zeta_{0})\leq 2\gamma$ , for any random variables $\xi_{0}$ and $\zeta_{0}$ .

(d)

Then we take $\beta>0$ , which is our most used parameter. We set $\beta$ satisfying the following:

\beta<\left\{\begin{array}[]{l}\delta-h_{\mu}(\mathcal{H},T);\\ \min(\varepsilon^{3}/24,\varepsilon/4);\\ \min(\gamma,\gamma h_{\mu}(\xi,T)/5);\\ \alpha/7.\end{array}\right.

Once $\beta$ is fixed, we choose $n_{0}\geq 1$ such that $\mathcal{P}_{0}\preceq_{\beta^{2}/2}(\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}$ .

(e)

Finally, we choose an integer $n$ , which will be the height of the Rokhlin tower. It is chosen larger than $N$ . We also need it to be large enough for us to apply the Shannon-McMillan-Breiman theorem, as well as Birkhoff’s ergodic theorem. As $n$ appears in many estimates, it needs to be large enough depending on $\varepsilon$ , $\delta$ , $\gamma$ , $\beta$ and $n_{0}$ . It would be quite tedious to give an explicit lower bound for $n$ , so, since all other parameters are now fixed and do not depend on $n$ , we simply point out throughout the proof the estimates where $n$ needs to be large.

Having now established the parameters, we begin the proof.

Step 1: The setup of the tower

As mentioned in (e), we choose $n$ so that we can apply the Shannon-McMillan-Breiman theorem (i.e. Theorem 3.5) and Birkhoff’s ergodic theorem to know that there exist two sets $\mathcal{E}_{n}^{0}\subset H^{[0,n[}$ and $\mathcal{B}_{n}^{0}\subset B^{[0,n[}$ such that

\begin{gathered}\mu(\mathcal{H}_{[0,n[}\in\mathcal{E}_{n}^{0})\geq 1-\beta/2,\;\text{ and }\;\mu(\xi_{[0,n[}\in\mathcal{B}_{n}^{0})\geq 1-\beta/3,\end{gathered}

(3)

on which the estimates (5), (6), (8) and (15) hold. Latter in the proof, we will see that we can take $\mathcal{E}_{n}\subset\mathcal{E}_{n}^{0}$ and $\mathcal{B}_{n}\subset\mathcal{B}_{n}^{0}$ subsets such that

\begin{gathered}\mu(\mathcal{H}_{[0,n[}\in\mathcal{E}_{n})\geq 1-\beta,\;\text{ and }\;\mu(\xi_{[0,n[}\in\mathcal{B}_{n})\geq 1-\beta,\end{gathered}

(4)

on which we also have (12) and (16). The fact that (12) holds for $\mathcal{E}_{n}$ appears in Step 2 and the fact that (16) holds for $\mathcal{B}_{n}$ appears in Step 3. Until then, we only use (4). For now, we present some of the estimates we have announced.

The first estimates given by the Shannon-McMillan-Breiman theorem are:

	$\displaystyle\forall\boldsymbol{h}\in\mathcal{E}_{n},\;2^{-(h_{\mu}(\mathcal{H},T)+\beta)n}\leq\mu(\mathcal{H}_{[0,n[}=\boldsymbol{h})\leq 2^{-(h_{\mu}(\mathcal{H},T)-\beta)n},$		(5)
	$\displaystyle\forall\boldsymbol{b}\in\mathcal{B}_{n},\;2^{-(h_{\mu}(\xi,T)+\beta)n}\leq\mu(\xi_{[0,n[}=\boldsymbol{b})\leq 2^{-(h_{\mu}(\xi,T)-\beta)n}.$		(6)

For any sequence $\boldsymbol{b}\in B^{[0,n[}$ and any element $b^{\prime}\in B$ , denote $f_{n}(\boldsymbol{b},b^{\prime})$ the frequency at which the element $b^{\prime}$ appears in the sequence $\boldsymbol{b}$ . This can also be defined as follows:

\forall x\in\{\xi_{[0,n[}=\boldsymbol{b}\},\;f_{n}(\boldsymbol{b},b^{\prime}):=\frac{1}{n}\sum_{j=0}^{n-1}\mathbbm{1}_{\{\xi_{0}=b^{\prime}\}}(T^{j}x).

(7)

From this definition of $f_{n}$ , it becomes clear that, as announced earlier, the estimate given by Birkhoff’s ergodic theorem is:

\sum_{b^{\prime}\in B}|f_{n}(\boldsymbol{b},b^{\prime})-\mu(\xi_{0}=b^{\prime})|\leq\beta.

(8)

Since $\mathcal{H}\vee\xi$ generates $\mathscr{A}$ , as said in (d), we can find $n_{0}\geq 1$ so that $\mathcal{P}_{0}\preceq_{\beta^{2}}(\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}$ . This means that there exists a $(\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}$ -measurable random variable $\tilde{\mathcal{P}}_{0}$ such that $\mu(\tilde{\mathcal{P}}_{0}\neq\mathcal{P}_{0})\leq\beta^{2}$ .

By making use of Proposition 3.6, we can build a set $G$ such that $F^{\prime}:=TG$ is disjoint from $G$ and $F^{\prime}$ is the base of a tower $\mathcal{G}_{n}:=\{T^{j}F^{\prime}\}_{0\leq j\leq n-1}$ such that $\mu(\mathcal{G}_{n})\geq 1-\beta$ and

\mathcal{L}((\mathcal{H}\vee\tilde{\mathcal{P}}\vee\xi)_{[0,n[}\,|\,F^{\prime})=\mathcal{L}((\mathcal{H}\vee\tilde{\mathcal{P}}\vee\xi)_{[0,n[}).

(9)

The set $G$ will be useful later to code the entrance of the tower. We slightly reduce the tower by setting $F:=F^{\prime}\cap\{\mathcal{H}_{[0,n[}\in\mathcal{E}_{n}\}\cap\{\xi_{[0,n[}\in\mathcal{B}_{n}\}$ and $\mathcal{T}_{n}:=\{T^{j}F\}_{0\leq j\leq n-1}$ . One can then use (9) with our previous estimates to see that $\mu(\mathcal{T}_{n})\geq 1-3\beta$ (by making sure that $1/n\leq\beta$ ).

We then split $\mathcal{T}_{n}$ into $\mathcal{H}$ -columns: for $\boldsymbol{h}\in{\mathcal{E}}_{n}$ , we define

C_{\boldsymbol{h}}:=\{T^{j}(F\cap\{\mathcal{H}_{[0,n[}=\boldsymbol{h}\})\}_{0\leq j\leq n-1},

(10)

so that $\mathcal{T}_{n}=\bigsqcup_{\boldsymbol{h}\in{\mathcal{E}}_{n}}C_{\boldsymbol{h}}$ (we mean that the levels of $\mathcal{T}_{n}$ are disjoint unions of the levels of $C_{\boldsymbol{h}}$ ). For each $\boldsymbol{h}\in\mathcal{E}_{n}$ , we say that $C_{\boldsymbol{h}}$ is the column of $\mathcal{H}_{[0,n[}$ -name $\boldsymbol{h}$ . We also denote by $F_{\boldsymbol{h}}:=F\cap\{\mathcal{H}_{[0,n[}=\boldsymbol{h}\}$ the base of $C_{\boldsymbol{h}}$ .

Step 2: Using the extremality of $\mathcal{P}$

We plan on modifying $\xi$ into a process $\tilde{\xi}$ so that the joint law of $\mathcal{P}\vee\tilde{\xi}$ is almost the same in most of the columns $\{C_{\boldsymbol{h}}\}_{\boldsymbol{h}\in\mathcal{E}_{n}}$ . We start by using the fact that $\mathbf{X}$ is Bernoulli to see that the law of $\mathcal{P}$ is almost the same on each column $C_{\boldsymbol{h}}$ . Indeed, since $\mathbf{X}$ is Bernoulli, $\mathcal{P}$ is extremal, and we fixed $\delta>0$ and $N\geq 1$ as the numbers associated to $\varepsilon^{3}/4$ in the definition of extremality and assume that $h_{\mu}(\mathcal{H},T)<\delta$ (see (a)). On the other hand, from (5), we deduce that

\#\mathcal{E}_{n}\leq 2^{(h_{\mu}(\mathcal{H},T)+\beta)n}.

Next we define the partition

\mathcal{Q}:=\left\{\begin{array}[]{ll}*&\text{on }\{\mathcal{H}_{[0,n[}\notin\mathcal{E}_{n}\}\cup\{\xi_{[0,n[}\notin\mathcal{B}_{n}\}\\ \mathcal{H}_{[0,n[}&\text{on }\{\mathcal{H}_{[0,n[}\in\mathcal{E}_{n}\}\cap\{\xi_{[0,n[}\in\mathcal{B}_{n}\}\\ \end{array}.\right.

In particular, we know that $\mu(\mathcal{Q}=*)\leq 2\beta$ . Moreover, the number of values taken by $\mathcal{Q}$ is bounded by

\#\mathcal{E}_{n}+1\leq 2^{(h_{\mu}(\mathcal{H},T)+\beta)n}+1\leq 2^{n\delta},

since $\beta<\delta-h_{\mu}(\mathcal{H},T)$ . Therefore the extremality of $\mathcal{P}$ tells us that, since $n\geq N$ , there exists a subset $\bar{\mathcal{E}}_{n}\subset{\mathcal{E}}_{n}$ such that

\mu(\mathcal{Q}\notin(\bar{\mathcal{E}}_{n}\cup\{*\}))\leq\varepsilon^{3}/4+2\beta\leq\varepsilon^{3}\leq\varepsilon,

(11)

and for $\boldsymbol{h}\in\bar{\mathcal{E}}_{n}$ , we have

\bar{d}_{n}(\mathcal{L}(\mathcal{P}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}),\mathcal{L}(\mathcal{P}_{[0,n[}))\leq\varepsilon^{3}/4.

As mentioned at the start of the proof, the set $\mathcal{E}_{n}$ is chosen so that (4) holds and we have

\forall\boldsymbol{h}\in\mathcal{E}_{n},\;\mu(\tilde{\mathcal{P}}_{0}\neq\mathcal{P}_{0}\,|\,\mathcal{Q}=\boldsymbol{h})\leq\beta.

(12)

This is possible because $\mu(\mathcal{P}_{0}\neq\tilde{\mathcal{P}}_{0})\leq\beta^{2}/2$ . Therefore, using that $\beta\leq\varepsilon^{3}/24$ with (12), this yields, for $\boldsymbol{h}\in\bar{\mathcal{E}}_{n}$ :

\bar{d}_{n}(\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}),\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}))\leq\varepsilon^{3}/3.

(13)

Step 3: Framework for the construction of $\tilde{\xi}_{0}$

We start the construction of $\tilde{\xi}$ by setting $\tilde{\xi}_{0}:=*$ on $G=T^{-1}F^{\prime}$ , where $*$ represents a symbol that does not belong to $B$ . Later in the proof, this will allow us to detect the entrance into $\mathcal{T}_{n}$ from the value of the process $\tilde{\xi}$ . Then define $\tilde{\xi}_{0}$ to take any value in $B$ on the rest of $\mathcal{T}_{n}^{c}$ . For $\boldsymbol{h}\in\mathcal{E}_{n}\backslash\bar{\mathcal{E}}_{n}$ , on $C_{\boldsymbol{h}}$ , we set $\tilde{\xi}_{0}:=\xi_{0}$ . We are left with defining our new random variable $\tilde{\xi}_{0}$ on the columns $C_{\boldsymbol{h}}$ , with $\boldsymbol{h}\in\bar{\mathcal{E}}_{n}$ . We start by fixing $\boldsymbol{h}_{0}\in\bar{\mathcal{E}}_{n}$ , and the column $C_{\boldsymbol{h}_{0}}$ will serve as a “model” for the other columns.

Next we fix an $\boldsymbol{h}\in\bar{\mathcal{E}}_{n}$ . We define sub-columns of $C_{\boldsymbol{h}}$ : for $\boldsymbol{b}\in\mathcal{B}_{n}$ ,

C_{\boldsymbol{h}}^{\boldsymbol{b}}:=\{T^{j}(F\cap\{\mathcal{H}_{[0,n[}=\boldsymbol{h}\}\cap\{\xi_{[0,n[}=\boldsymbol{b}\})\}_{0\leq j\leq n-1}.

(14)

We say that $\boldsymbol{b}$ the $\xi$ -name of $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ . Because of our definition of $F$ and (9), the set $\mathcal{B}_{n}$ gives us exactly the $\xi$ -names of all the sub-columns in $C_{\boldsymbol{h}}$ . We will then give each sub-column $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ a new word $\tilde{\boldsymbol{b}}\in\mathcal{B}_{n}$ and define the random variable $\tilde{\xi}_{0}$ on $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ as the only variable such that $\tilde{\boldsymbol{b}}$ is the $\tilde{\xi}$ -name of $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ . This means that to conclude the construction of $\tilde{\xi}_{0}$ on $C_{\boldsymbol{h}}$ , we simply need to build a map $\varphi_{\boldsymbol{h}}:\mathcal{B}_{n}\longrightarrow\mathcal{B}_{n},$ and the properties we will obtain on $\tilde{\xi}$ will follow from our choice for $\varphi_{\boldsymbol{h}}$ .

In order to give us some additional leeway, we use the parameter $\gamma>0$ introduced at the start of the proof: we define $n_{1}:=\lfloor(1-\gamma)n\rfloor\leq n$ , and for $\boldsymbol{b}\in\mathcal{B}_{n}$ , we denote by $\boldsymbol{b}_{n_{1}}:=\boldsymbol{b}_{[0,n_{1}[}\in B^{[0,n_{1}[}$ the truncated sub-sequence of $\boldsymbol{b}$ of length $n_{1}$ . Conversely, for $\bar{\boldsymbol{b}}\in B^{[0,n_{1}[}$ , define

B(\bar{\boldsymbol{b}}):=\{\boldsymbol{b}\in\mathcal{B}_{n}\,|\,\boldsymbol{b}_{n_{1}}=\bar{\boldsymbol{b}}\},

and

\mathcal{B}_{n_{1}}:=\{\bar{\boldsymbol{b}}\in B^{[0,n_{1}[}\,|\,B(\bar{\boldsymbol{b}})\neq\varnothing\}.

We will obtain the map $\varphi_{\boldsymbol{h}}$ by building an injective map $\psi_{\boldsymbol{h}}:\mathcal{B}_{n_{1}}\longrightarrow\mathcal{B}_{n}$ and setting $\varphi_{\boldsymbol{h}}(\boldsymbol{b}):=\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}})$ . We start be recalling that we chose $n$ and $\mathcal{B}_{n}$ so that the estimate given by the Shannon-McMillan-Breiman Theorem, i.e. (6), still holds when replacing $n$ by $n_{1}$ . More precisely, we mean that, for $\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}$

2^{-(h_{\mu}(\xi,T)+\beta)n_{1}}\leq\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}})\leq 2^{-(h_{\mu}(\xi,T)-\beta)n_{1}}.

(15)

Moreover, we stated at the start of the proof that $\mathcal{B}_{n}$ is chosen such that, for $\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}$

\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\in\mathcal{B}_{n})\geq\frac{1}{2}\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}}).

(16)

We need to prove that statement. We do this by considering the set

\mathcal{C}:=\{\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}^{0}\,|\,\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\notin\mathcal{B}_{n}^{0})\geq\frac{1}{2}\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}})\}.

From the definition of $\mathcal{C}$ , we get

\frac{1}{2}\mu(\xi_{[0,n_{1}[}\in\mathcal{C})\leq\mu(\xi_{[0,n[}\notin\mathcal{B}_{n}^{0})\leq\beta/3.

Then, we define $\mathcal{B}_{n}$ as $\mathcal{B}_{n}:=\{\boldsymbol{b}\in\mathcal{B}_{n}^{0}\,|\,\boldsymbol{b}_{n_{1}}\notin\mathcal{C}\}$ , and easily get that $\mu(\xi_{[0,n[}\in\mathcal{B}_{n})\geq 1-\beta$ . Next, because the set we removed from $\mathcal{B}_{n}^{0}$ is measurable with respect to the truncated sequences of length $n_{1}$ , for $\bar{\boldsymbol{b}}\notin\mathcal{C}$ , we get

\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\in\mathcal{B}_{n})=\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\in\mathcal{B}_{n}^{0}),

so (16) follows from the definition of $\mathcal{C}$ .

Finally, putting (15) and (16) together, we get, for $\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}$

\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\in\mathcal{B}_{n})\geq\frac{1}{2}2^{-(h_{\mu}(\xi,T)+\beta)n_{1}}\geq 2^{-(h_{\mu}(\xi,T)+2\beta)n_{1}},

(17)

by making sure that $n_{1}$ large enough. This will enable us to control the measure of the part of the truncated column over $\{\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}}\}$ that is in $\mathcal{T}_{n}$ .

Step 4: Estimates for Hall’s marriage lemma

From (6) and (15), we can tell that

	$\displaystyle\#\mathcal{B}_{n_{1}}$	$\displaystyle\leq 2^{(h_{\mu}(\xi,T)+\beta)n_{1}}\leq 2^{(h_{\mu}(\xi,T)+\beta)(1-\gamma)n}$
		$\displaystyle\leq 2^{(h_{\mu}(\xi,T)-\gamma h_{\mu}(\xi,T)+\beta)n}\leq(1-\beta)2^{(h_{\mu}(\xi,T)-\beta)n}\leq\#\mathcal{B}_{n},$

since $2\beta<\gamma h_{\mu}(\xi,T)$ and we can choose $n$ large enough. This inequality is also clearly true from the definition of $\mathcal{B}_{n_{1}}$ , but we include this computation, as a similar one will be essential later in the proof. That being said, this inequality means that it is possible to find an injective map from $\mathcal{B}_{n_{1}}$ to $\mathcal{B}_{n}$ , but we want to be more specific about which injective map we choose. To that end, we will make use of Hall’s marriage lemma. To do that, for each $\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}$ , we need to specify which elements of $\mathcal{B}_{n}$ we consider suitable $\tilde{\xi}$ -names for the columns $\{C_{\boldsymbol{h}}^{\boldsymbol{b}};\,\boldsymbol{b}\in B(\bar{\boldsymbol{b}})\}$ .

We recall that $n_{0}$ is the integer chosen so that $\tilde{\mathcal{P}}_{0}$ is $(\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}$ -measurable. Define $L_{n}:=[n_{0},n_{1}-n_{0}[\subset\mathbb{Z}$ and $\ell:=n_{1}-2n_{0}$ the length of $L_{n}$ . Because $\tilde{\mathcal{P}}_{0}$ is $(\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}$ -measurable, $\tilde{\mathcal{P}}_{L_{n}}$ is $(\mathcal{H}\vee\xi)_{[0,n_{1}[}$ -measurable. So, for $\boldsymbol{h}$ fixed, for each $\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}$ , on the set $\{\mathcal{H}_{[0,n[}=\boldsymbol{h},\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}}\}$ , there can be only one value of $\tilde{\mathcal{P}}_{L_{n}}$ , which we denote $\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}})$ .

For $\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}$ , the suitable corresponding $\tilde{\xi}$ -names will be the elements $\boldsymbol{b}\in\mathcal{B}_{n}$ for which $d_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h}_{0},\boldsymbol{b}_{n_{1}}),\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}}))\leq\varepsilon$ . More formally, we set

J_{\bar{\boldsymbol{b}}}:=\{\boldsymbol{b}\in\mathcal{B}_{n}\,|\,d_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h}_{0},\boldsymbol{b}_{n_{1}}),\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}}))\leq\varepsilon\},

and we want to build $\psi_{\boldsymbol{h}}$ so that we have

\psi_{\boldsymbol{h}}(\bar{\boldsymbol{b}})\in J_{\bar{\boldsymbol{b}}},

(18)

for as many $\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}$ as possible.

From (13), it follows that

\bar{d}_{n}(\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}),\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}_{0}))\leq 2\varepsilon^{3}/3.

Therefore:

	$\displaystyle\bar{d}_{\ell}(\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,\|\,\mathcal{Q}=\boldsymbol{h})$	$\displaystyle,\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,\|\,\mathcal{Q}=\boldsymbol{h}_{0}))$
		$\displaystyle\leq\frac{n}{n_{1}-2n_{0}}\bar{d}_{n}(\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,\|\,\mathcal{Q}=\boldsymbol{h}),\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,\|\,\mathcal{Q}=\boldsymbol{h}_{0}))$
		$\displaystyle\leq\frac{n}{(1-\gamma)n-n_{0}}2\varepsilon^{3}/3<\varepsilon^{3},$

by choosing $n$ large enough. So there exists $\lambda\in\mathscr{P}(A^{L_{n}}\times A^{L_{n}})$ a coupling of $\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h})$ and $\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h}_{0})$ such that

\int d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\leq\varepsilon^{3}.

Denote by $\lambda_{1}$ and $\lambda_{2}$ the marginals of $\lambda$ , i.e. $\lambda_{1}=\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h})$ and $\lambda_{2}=\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h}_{0})$ . We are interested in the set $\mathcal{A}_{\ell}\subset A^{L_{n}}$ defined by

\mathcal{A}_{\ell}:=\{\boldsymbol{p}\in A^{L_{n}}\,;\,\lambda(d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\leq\varepsilon\,|\,\boldsymbol{p}_{1}=\boldsymbol{p})\geq 1-\varepsilon\}.

The following gives an estimate on the measure of $\mathcal{A}_{\ell}$ :

	$\displaystyle\varepsilon^{3}\geq$	$\displaystyle\int d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\geq\int_{\boldsymbol{p}_{1}\notin\mathcal{A}_{\ell}}d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2})$
		$\displaystyle=\int_{\boldsymbol{p}\notin\mathcal{A}_{\ell}}\int d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2}\,\|\,\boldsymbol{p}_{1}=\boldsymbol{p})d\lambda_{1}(\boldsymbol{p})$
		$\displaystyle\geq\int_{\boldsymbol{p}\notin\mathcal{A}_{\ell}}\lambda(d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})>\varepsilon\,\|\,\boldsymbol{p}_{1}=\boldsymbol{p})\cdot\varepsilon d\lambda_{1}(\boldsymbol{p})$
		$\displaystyle>\varepsilon^{2}\mu(\tilde{\mathcal{P}}_{L_{n}}\notin\mathcal{A}_{\ell}\,\|\,\mathcal{Q}=\boldsymbol{h}),$

so $\mu(\tilde{\mathcal{P}}_{L_{n}}\notin\mathcal{A}_{\ell}\,|\,\mathcal{Q}=\boldsymbol{h})<\varepsilon$ . In other words, if we set

\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h}):=\{\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}\,|\,\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}})\in\mathcal{A}_{\ell}\},

we have $\mu(\xi_{[0,n_{1}[}\in\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})\,|\,\mathcal{Q}=\boldsymbol{h})\geq 1-\varepsilon$ . The set $\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})$ is the set on which we want (18) to hold. Hall’s marriage lemma tells us that there exists an injective map $\psi_{\boldsymbol{h}}:\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})\rightarrow\mathcal{B}_{n}$ for which (18) is true if we have the following:

\forall I\subset\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h}),\;\#I\leq\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}.

(19)

Let $I\subset\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})$ . Consider $K:=\bigcup_{\bar{\boldsymbol{b}}\in I}\{\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}})\}\subset\mathcal{A}_{\ell}$ and note that

\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}=\{\boldsymbol{b}\in\mathcal{B}_{n}\,|\,d_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h}_{0},\boldsymbol{b}_{n_{1}}),K)\leq\varepsilon\}.

Taking that into account, we have

\begin{split}\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}&\geq 2^{(h_{\mu}(\xi,T)-\beta)n}\mu(d_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h_{0}},\xi_{[0,n_{1}[}),K)\leq\varepsilon,\xi_{[0,n[}\in\mathcal{B}_{n})\\ &=2^{(h_{\mu}(\xi,T)-\beta)n}\mu(d_{\ell}(\tilde{\mathcal{P}}_{L_{n}},K)\leq\varepsilon,\xi_{[0,n[}\in\mathcal{B}_{n}\,|\,\mathcal{H}_{[0,n[}=\boldsymbol{h}_{0})\\ &=2^{(h_{\mu}(\xi,T)-\beta)n}\mu(\xi_{[0,n[}\in\mathcal{B}_{n})\mu(d_{\ell}(\tilde{\mathcal{P}}_{L_{n}},K)\leq\varepsilon\,|\,\mathcal{Q}=\boldsymbol{h}_{0})\\ &\geq 2^{(h_{\mu}(\xi,T)-\beta)n}(1-\beta)\lambda(d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\leq\varepsilon,\boldsymbol{p}_{1}\in K)\\ &\geq 2^{(h_{\mu}(\xi,T)-2\beta)n}\int_{\boldsymbol{p}\in K}\lambda(d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\leq\varepsilon\,|\,\boldsymbol{p}_{1}=\boldsymbol{p})d\lambda_{1}(\boldsymbol{p})\\ &\geq 2^{(h_{\mu}(\xi,T)-2\beta)n}(1-\varepsilon)\lambda_{1}(K),\,\text{ because }K\subset\mathcal{A}_{\ell}\\ &\geq 2^{(h_{\mu}(\xi,T)-3\beta)n}\mu(\tilde{\mathcal{P}}_{L_{n}}\in K\,|\,\mathcal{Q}=\boldsymbol{h}).\end{split}

(20)

making sure again that $n$ is large enough. Moreover, using (17), we get

	$\displaystyle\#I$	$\displaystyle\leq 2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\xi_{[0,n_{1}[}\in I,\xi_{[0,n[}\in\mathcal{B}_{n})$
		$\displaystyle\leq 2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\xi_{[0,n_{1}[}\in I\,\|\,\xi_{[0,n[}\in\mathcal{B}_{n})$
		$\displaystyle=2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\xi_{[0,n_{1}[}\in I\,\|\,\mathcal{Q}=\boldsymbol{h}),\,\text{ because }\xi\perp\!\!\!\perp\mathcal{H}$
		$\displaystyle\leq 2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\tilde{\mathcal{P}}_{L_{n}}\in K\,\|\,\mathcal{Q}=\boldsymbol{h}),$

by definition of $K$ . Together with (20), it yields

	$\displaystyle\#I$	$\displaystyle\leq 2^{((1-\gamma)(h_{\mu}(\xi,T)+2\beta)-(h_{\mu}(\xi,T)-3\beta))n}\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}$
		$\displaystyle\leq 2^{(5\beta-\gamma h_{\mu}(\xi,T))n}\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}\leq\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}},$

since $5\beta\leq\gamma h_{\mu}(\xi,T)$ . Therefore there exists an injective map $\psi_{\boldsymbol{h}}:\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})\rightarrow\mathcal{B}_{n}$ for which (18) holds. As we noted that $\#\mathcal{B}_{n_{1}}\leq\#\mathcal{B}_{n}$ , $\psi_{\boldsymbol{h}}$ can then be extended to an injective map defined on $\mathcal{B}_{n_{1}}$ (still taking values in $\mathcal{B}_{n}$ ). We recall that, with $\psi_{\boldsymbol{h}}$ built, we set $\varphi_{\boldsymbol{h}}(\boldsymbol{b}):=\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}})$ .

As we announced at the start of our reasoning, we define $\tilde{\xi}_{0}$ on the levels of $C_{\boldsymbol{h}}$ so that the $\tilde{\xi}$ -name of each sub-column $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ is $\varphi_{\boldsymbol{h}}(\boldsymbol{b})=\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}})$ . Since this construction can be done with every $\boldsymbol{h}\in\bar{\mathcal{E}}_{n}$ (with the map $\psi_{\boldsymbol{h}}$ depending on $\boldsymbol{h}$ ), we have completed the construction of $\tilde{\xi}_{0}$ . We now need to check that $\tilde{\xi}$ satisfies the conditions (i), (ii) and (iii) of our lemma.

Step 5: Proving that $\tilde{\xi}$ satsifies (i), (ii) and (iii)

We start by estimating the law of $\tilde{\xi}_{0}$ . Since $\mu(\mathcal{T}_{n})\geq 1-3\beta$ , we have

	$\displaystyle\sum_{b\in B}$	$\displaystyle\|\mu(\tilde{\xi}_{0}=b)-\mu(\xi_{0}=b)\|\leq\sum_{b\in B}\|\mu(\{\tilde{\xi}_{0}=b\}\cap\mathcal{T}_{n})-\mu(\xi_{0}=b)\mu(\mathcal{T}_{n})\|+6\beta$
		$\displaystyle\leq\sum_{b\in B}\sum_{\boldsymbol{h}\in\mathcal{E}_{n},\boldsymbol{b}\in\mathcal{B}_{n}}\|\mu(\{\tilde{\xi}_{0}=b\}\cap C_{\boldsymbol{h}}^{\boldsymbol{b}})-\mu(\xi_{0}=b)\mu(C_{\boldsymbol{h}}^{\boldsymbol{b}})\|+6\beta.$

We recall that $f_{n}(\boldsymbol{b},b^{\prime})$ is the frequency at which the element $b^{\prime}$ appears in the sequence $\boldsymbol{b}$ (see (7)). Moreover, one can see that, since $\varphi_{\boldsymbol{h}}(\boldsymbol{b})$ is the $\tilde{\xi}$ -name of $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ and all the levels of $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ have the same measure, we have

\mu(\{\tilde{\xi}_{0}=b\}\cap C_{\boldsymbol{h}}^{\boldsymbol{b}})=\mu(C_{\boldsymbol{h}}^{\boldsymbol{b}})\cdot f_{n}(\varphi_{\boldsymbol{h}}(\boldsymbol{b}),b).

Therefore, because $\varphi_{\boldsymbol{h}}$ takes values in $\mathcal{B}_{n}$ , (8) yields:

	$\displaystyle\sum_{b\in B}\|\mu(\tilde{\xi}_{0}=b)-\mu(\xi_{0}=b)\|$	$\displaystyle\leq\sum_{b\in B}\sum_{\boldsymbol{h}\in\mathcal{E}_{n},\boldsymbol{b}\in\mathcal{B}_{n}}\mu(C_{\boldsymbol{h}}^{\boldsymbol{b}})\|f_{n}(\varphi_{\boldsymbol{h}}(\boldsymbol{b}),b)-\mu(\xi_{0}=b)\|+6\beta$
		$\displaystyle\leq\beta\mu(\mathcal{T}_{n})+6\beta\leq 7\beta.$

This means that $\bar{d}_{1}(\mathcal{L}(\tilde{\xi}_{0}),\mathcal{L}(\xi_{0}))\leq 7\beta\leq\alpha$ (using (d)).

We now turn our attention to the entropy of $\mathcal{H}\vee\tilde{\xi}$ . The $\tilde{\xi}$ -name of a column $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ is $\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}})$ , and since $\psi_{\boldsymbol{h}}$ is invective, we can deduce $\boldsymbol{b}_{n_{1}}$ from the $\tilde{\xi}$ -name of $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ . This means that, on the levels of the truncated tower $\mathcal{T}_{n_{1}}:=\{T^{j}F\}_{0\leq j\leq n_{1}-1}$ , $\xi_{0}$ is $(\mathcal{H}\vee\tilde{\xi})_{[-n_{1},n[}$ -measurable. Indeed, if $x$ is in $\mathcal{T}_{n_{1}}$ and the sequence $(\mathcal{H}\vee\tilde{\xi})_{[-n_{1},n[}(x)$ is known, the sequence $\tilde{\xi}_{[-n_{1},0[}(x)$ must contain a “ $*$ ”, which indicates the moment the past orbit of $x$ passes trough $G$ before entering $\mathcal{T}_{n}$ . So the position of “ $*$ ” in $\tilde{\xi}_{[-n_{1},0[}(x)$ tells us the index of the level of $\mathcal{T}_{n_{1}}$ the point $x$ is on, which we call $j_{0}$ . In other words, we mean that $T^{-j_{0}}x\in F$ . Then, $(\mathcal{H}\vee\tilde{\xi})_{[-j_{0},n-j_{0}[}$ gives the $(\mathcal{H}\vee\tilde{\xi})$ -name of the column $x$ is on, from which we deduce the truncated $\xi$ -name of length $n_{1}$ of the column. Finally, the $j_{0}$ -th letter of that name gives us $\xi_{0}(x)$ .

Therefore, if we combine the previous paragraph with the fact that $\mu(\mathcal{T}_{n_{1}})\geq(1-\gamma)\mu(\mathcal{T}_{n})$ , there exists a $(\mathcal{H}\vee\tilde{\xi})_{[-n_{1},n[}$ -measurable random variable $\chi_{0}$ such that $\mu(\chi_{0}\neq\xi_{0})\leq\beta+\gamma\leq 2\gamma$ (because $\beta\leq\gamma$ ). So, by the choice of $\gamma$ made in (c), we can apply Lemma 2.7 and conclude that

\displaystyle h_{\mu}(\mathcal{H}\vee\xi,T)

\displaystyle\leq h_{\mu}(\mathcal{H}\vee\chi,T)+\alpha\leq h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)+\alpha.

Because $\mathcal{H}\vee\xi$ generates $\mathscr{A}$ , we also have the converse inequality

h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)\leq h_{\mu}(\mathcal{H}\vee\xi,T),

so we have proved that $\tilde{\xi}$ satisfies condition (ii) of our lemma.

We are now left with proving (iii). If we consider that $\tilde{\xi}_{[-n,n[}(x)$ is known, we deduce that, if the symbol “ $*$ ” appears in $\tilde{\xi}_{[-n,0[}$ , then $x$ is in $\mathcal{T}_{n}$ and the position of “ $*$ ” tells us the index $j_{0}$ of the level of $\mathcal{T}_{n}$ the point $x$ is on. Then, using the notation introduced in our construction above, we can look at the random variable $\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})$ . It is $\tilde{\xi}_{[-n,n[}$ -measurable and we are going to show that it satisfies

\mu(\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})\neq\mathcal{P}_{0})\leq 5\varepsilon.

(21)

We start by looking at a column $C_{\boldsymbol{h}}$ for some $\boldsymbol{h}\in\bar{\mathcal{E}}_{n}$ . We then split it into sub-columns $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ . If $\boldsymbol{b}_{n_{1}}\in\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})$ , we are going to use (18). First, we need to remember that if $x$ is in $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ , then $\tilde{\xi}_{[-j_{0},n-j_{0}[}(x)$ gives the $\tilde{\xi}$ -name of the column $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ . But, by construction, that name is $\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}})$ , and, because we are looking at the case where $\boldsymbol{h}\in\bar{\mathcal{E}}_{n}$ and $\boldsymbol{b}_{n_{1}}\in\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})$ , (18) holds. So we have

d_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[}),\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\boldsymbol{b}_{n_{1}}))\leq\varepsilon.

We recall that $L_{n}=[n_{0},n_{1}-n_{0}[$ and $\ell$ is its length. By definition of $d_{\ell}$ , we know that the number of levels $j_{0}\in L_{n}$ on which we have $\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})=\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h},\boldsymbol{b}_{n_{1}})$ is greater than $(1-\varepsilon)\ell$ . Moreover, by construction, for $j_{0}\in L_{n}$ , on the $j_{0}$ -th level of $C_{\boldsymbol{h}}^{\boldsymbol{b}}$ , we have $\tilde{\mathcal{P}}_{0}=\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h},\boldsymbol{b}_{n_{1}})$ . Finally, since $\ell=n_{1}-2n_{0}$ , we have

	$\displaystyle\mu(\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})\neq\tilde{\mathcal{P}}_{0}\,\|\,C_{\boldsymbol{h}}^{\boldsymbol{b}})$	$\displaystyle\leq\frac{n-(1-\varepsilon)(n_{1}-2n_{0})}{n}$
		$\displaystyle\leq\frac{n-(1-\varepsilon)(1-\gamma)n+2n_{0}}{n}$
		$\displaystyle\leq\varepsilon+\gamma+\frac{2n_{0}}{n}\leq 2\varepsilon,$

since $\gamma\leq\varepsilon/2$ and we can assume that $n$ is large enough so that $2n_{0}/n\leq\varepsilon/2$ . Moreover, the fact that $\boldsymbol{h}\in\bar{\mathcal{E}}_{n}$ implies that $\mu(\xi_{[0,n_{1}[}\in\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})\,|\,\mathcal{Q}=\boldsymbol{h})\geq 1-\varepsilon$ , and, combining it with (9), we can see that

\mu\left(\left.\bigcup_{\boldsymbol{b}_{n_{1}}\notin\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})}C_{\boldsymbol{h}}^{\boldsymbol{b}}\,\right|\,C_{\boldsymbol{h}}\right)\leq\varepsilon.

Therefore

\mu(\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})\neq\tilde{\mathcal{P}}_{0}\,|\,C_{\boldsymbol{h}})\leq 3\varepsilon.

	$\displaystyle\mu(\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})\neq\tilde{\mathcal{P}}_{0})$	$\displaystyle\leq 3\varepsilon+\mu\left(\bigcup_{\boldsymbol{h}\notin\bar{\mathcal{E}}_{n}}C_{\boldsymbol{h}}\right)+\mu(\mathcal{T}_{n}^{c})$
		$\displaystyle\leq 3\varepsilon+\varepsilon+\mu(\mathcal{T}_{n}^{c})\;\text{ using \eqref{eq:indep_base_tower} and \eqref{eq:set_extremality}}$
		$\displaystyle\leq 4\varepsilon+3\beta.$

Finally, $\tilde{\mathcal{P}}_{0}$ was chosen so that $\mu(\tilde{\mathcal{P}}_{0}\neq\mathcal{P}_{0})\leq\beta^{2}\leq\beta$ , since $\beta\leq\varepsilon/4$ , we have proven (21), and therefore, up to replacing $\varepsilon$ by $\varepsilon/5$ , we have shown that

\mathcal{P}_{0}\preceq_{\varepsilon}\sigma(\tilde{\xi}).

∎

3.2 Application of the technical lemma

We are now left with proving Proposition 3.2 using Lemma 3.7. This is done using some abstract results from Thouvenot [26, Proposition 2’, Proposition 3]. We start by rewriting those results with our notation. We give a slight simplification, adapted to our setup.

First, [26, Proposition 2’] tells us that a process close enough to an i.i.d. process independent from $\mathscr{H}$ in law and entropy can be turned into an i.i.d. process independent from $\mathscr{H}$ .

Proposition 3.8.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic system of finite entropy. Let $\mathcal{H}$ be a finite valued process defined on $\mathbf{X}$ and $\rho$ be a probability measure on a finite alphabet $B$ . For every $\varepsilon>0$ , there exist $\alpha>0$ such that if a random variable $\tilde{\xi}_{0}:X\rightarrow B$ satisfies

(i)

$\bar{d}_{1}(\rho,\mathcal{L}(\tilde{\xi}_{0}))\leq\alpha$ ,
(ii)

and $0\leq h_{\mu}(\mathcal{H},T)+H(\rho)-h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)\leq\alpha$ ,

then there exists a random variable $\xi^{\prime}_{0}$ of law $\rho$ such that the process $\xi^{\prime}:=(\xi^{\prime}_{0}\circ T^{n})_{n\in\mathbb{Z}}$ is i.i.d., independent from $\mathcal{H}$ and we have

\mu(\tilde{\xi}_{0}\neq\xi^{\prime}_{0})\leq\varepsilon.

Next, [26, Proposition 3] tells us that on a system that is relatively Bernoulli over a factor $\mathscr{H}$ , any i.i.d. process independent from $\mathscr{H}$ with the right entropy can be turned into an independent complement of $\mathscr{H}$ :

Proposition 3.9.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be an ergodic system, $\mathcal{H}$ a finite valued process and $\xi$ a finite valued i.i.d. process independent from $\mathcal{H}$ such that $\mathscr{A}=\sigma(\mathcal{H})\vee\sigma(\xi)$ mod $\mu$ . For any $\varepsilon>0$ and any i.i.d. process $\zeta$ independent from $\mathcal{H}$ such that $h_{\mu}(\xi,T)=h_{\mu}(\zeta,T)$ , there exists $\tilde{\zeta}_{0}$ such that $\mathcal{L}(\mathcal{H}\vee\tilde{\zeta})=\mathcal{L}(\mathcal{H}\vee\zeta)$ , $\mathscr{A}=\sigma(\mathcal{H})\vee\sigma(\tilde{\zeta})$ mod $\mu$ and

\mu(\tilde{\zeta}_{0}\neq\zeta_{0})\leq\varepsilon.

We are now fully equipped to end the proof of Proposition 3.2:

Proof of Proposition 3.2.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a Bernoulli shift of finite entropy and $\mathcal{P}_{0}:X\rightarrow A$ a finite valued random variable such that $\mathscr{A}=\sigma(\{\mathcal{P}_{0}\circ T^{n}\}_{n\in\mathbb{Z}})$ . As we consider a factor $\sigma$ -algebra $\mathscr{H}$ of $\mathbf{X}$ , it has finite entropy, therefore there exists a finite valued random variable $\mathcal{H}_{0}:X\rightarrow H$ such that the process $\mathcal{H}:=(\mathcal{H}_{0}\circ T^{n})_{n\in\mathbb{Z}}$ generates $\mathscr{H}$ . Lastly, we take an i.i.d. process $\xi$ independent from $\mathscr{H}$ such that $\mathscr{A}=\mathscr{H}\vee\sigma(\xi)$ mod $\mu$ . Let $\varepsilon>0$ .

Now, Lemma 3.7 tells us that there is $\delta>0$ for which, if $h_{\mu}(\mathscr{H},T)\leq\delta$ , then for any $\alpha>0$ , there is a random variable $\tilde{\xi}_{0}$ such that

(i)

$\bar{d}_{1}(\mathcal{L}(\xi_{0}),\mathcal{L}(\tilde{\xi}_{0}))\leq\alpha$ ,
(ii)

$0\leq h_{\mu}(\mathcal{H}\vee\xi,T)-h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)\leq\alpha$ ,
(iii)

and $\mathcal{P}_{0}\preceq_{\varepsilon/4}\sigma(\tilde{\xi})$ .

Denote $\tilde{\mathcal{P}}_{0}$ a $\tilde{\xi}$ -measurable random variable such that $\mu(\tilde{\mathcal{P}}_{0}\neq\mathcal{P}_{0})\leq\varepsilon/4$ . We can find an integer $N\geq 1$ for which $\tilde{\mathcal{P}}_{0}\preceq_{\varepsilon/4}\tilde{\xi}_{[-N,N]}$ and set $\varepsilon_{1}:=\varepsilon/(4(2N+1))$ . If $\alpha$ is chosen small enough, then Proposition 3.8 tells us that there is a random variable $\xi^{\prime}_{0}$ such that the process $(\xi^{\prime}_{0}\circ T^{n})_{n\in\mathbb{Z}}$ is i.i.d., independent from $\mathcal{H}$ and we have $\mu(\xi^{\prime}_{0}\neq\tilde{\xi}_{0})\leq\varepsilon_{1}$ . Finally, Proposition 3.9 tells us that we can then find a random variable $\xi^{\prime\prime}_{0}$ for which the process $(\xi^{\prime\prime}_{0}\circ T^{n})_{n\in\mathbb{Z}}$ is still i.i.d., independent from $\mathcal{H}$ , but we also have that $\mathscr{A}=\mathscr{H}\vee\sigma(\xi^{\prime\prime})$ mod $\mu$ and $\mu(\xi^{\prime}_{0}\neq\tilde{\xi}_{0})\leq\varepsilon_{1}$ . So we have $\mu(\xi^{\prime\prime}_{0}\neq\tilde{\xi}_{0})\leq 2\varepsilon_{1}$ .

Combining that with the fact that $\tilde{\mathcal{P}}_{0}\preceq_{\varepsilon/4}\tilde{\xi}_{[-N,N]}$ , we get that $\tilde{\mathcal{P}}_{0}\preceq_{3\varepsilon/4}\xi^{\prime\prime}_{[-N,N]}$ , so

\mathcal{P}_{0}\preceq_{\varepsilon}\xi^{\prime\prime}_{[-N,N]}.

Setting $\mathscr{B}:=\sigma(\xi^{\prime\prime})$ , we get the Bernoulli factor desired to prove our proposition. ∎

3.3 Proof of Theorem 3.1

In the previous section, we managed to conclude the proof of Proposition 3.2. We now see how Theorem 3.1 follows from that proposition:

Proof of Theorem 3.1.

Let $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ be a Bernoulli system and $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ be a weak Pinsker filtration. Since $\mathscr{F}$ is a weak Pinsker filtration, if $(\mathscr{F}_{n})_{\leq-1}$ is of product type, so is $\mathscr{F}$ . Therefore, up to replacing $\mathbf{X}$ by the factor generated by $\mathscr{F}_{-1}$ , we can assume that $\mathbf{X}$ has finite entropy. Thanks to Theorem 2.18, this means that we can set a finite alphabet $A$ and a random variable $\mathcal{P}_{0}:X\rightarrow A$ such that the corresponding process $\mathcal{P}:=(\mathcal{P}_{0}\circ T^{i})_{i\in\mathbb{Z}}$ generates $\mathscr{A}$ , i.e. $\mathscr{A}=\sigma(\mathcal{P})$ mod $\mu$ . Let $(\varepsilon_{k})_{k\geq 1}$ be a decreasing sequence of positive numbers such that $\lim_{k\rightarrow\infty}\varepsilon_{k}=0$ .

We need to build a strictly increasing sequence $(n_{k})_{k\leq 0}$ such that $(\mathscr{F}_{n_{k}})_{k\leq 0}$ is of product type. We start by setting $n_{0}=0$ . Since $\lim_{n\rightarrow-\infty}h_{\mu}(\mathscr{F}_{n},T)=0$ , we can choose $n_{-1}\leq 0$ large enough (in absolute value), so that $h_{\mu}(\mathscr{F}_{n_{-1}},T)$ is small enough for Proposition 3.2 to enable us to build a Bernoulli factor $\sigma$ -algebra $\mathscr{B}_{n_{-1}}$ that is an independent complement of $\mathscr{F}_{n_{-1}}$ such that $\mathcal{P}_{0}\preceq_{\varepsilon_{1}}\mathscr{B}_{n_{-1}}$ .

Now take $k\leq-1$ and assume that we have built $(\mathscr{B}_{n_{-1}},...,\mathscr{B}_{n_{k}})$ such that they are mutually independent Bernoulli factors such that for $k\leq j\leq-1$ , $\mathscr{B}_{n_{j}}$ is independent from $\mathscr{F}_{n_{j}}$ , $\mathscr{F}_{n_{j+1}}=\mathscr{F}_{n_{j}}\vee\mathscr{B}_{n_{j}}$ and we have

\mathcal{P}_{0}\preceq_{\varepsilon_{|k|}}\bigvee_{k\leq j\leq-1}\mathscr{B}_{n_{j}}.

(22)

By construction of the $\mathscr{B}_{n_{j}}$ , we know that $\mathcal{P}$ is measurable with respect to $\mathscr{F}_{n_{k}}\vee\bigvee_{k\leq j\leq-1}\mathscr{B}_{n_{j}}$ . Moreover, using again Theorem 2.18, there is a random variable $\mathcal{P}^{(k)}_{0}$ such that the process $\mathcal{P}^{(k)}:=(\mathcal{P}^{(k)}_{0}\circ T^{i})_{i\in\mathbb{Z}}$ generates $\mathscr{F}_{n_{k}}$ . So there exists an integer $N\geq 1$ such that

\mathcal{P}_{0}\preceq_{\varepsilon_{|k|+1}/2}\mathcal{P}^{(k)}_{[-N,N]}\vee\bigvee_{k\leq j\leq-1}\mathscr{B}_{n_{j}}.

(23)

Then set $\tilde{\varepsilon}:=\varepsilon_{|k|+1}/(2(2N+1))>0$ . As we did above, we choose $n_{k-1}\leq n_{k}$ large enough in absolute value so that $h_{\mu}(\mathscr{F}_{n_{k-1}},T)$ is small enough for us to apply Proposition 3.2 to find a Bernoulli factor $\mathscr{B}_{n_{k-1}}\subset\mathscr{F}_{n_{k}}$ such that $\mathscr{B}_{n_{k-1}}{\perp\!\!\!\perp}\mathscr{F}_{n_{k-1}}$ , $\mathscr{F}_{n_{k}}=\mathscr{F}_{n_{k-1}}\vee\mathscr{B}_{n_{k-1}}$ and

\mathcal{P}^{(k)}_{0}\preceq_{\tilde{\varepsilon}}\mathscr{B}_{n_{k-1}}.

(24)

Putting (23) and (24) together, we get

\mathcal{P}_{0}\preceq_{\varepsilon_{|k|+1}}\bigvee_{k-1\leq j\leq-1}\mathscr{B}_{n_{j}}.

Iterating this for every $k\leq-1$ ends our construction of $(n_{k})_{k\leq 0}$ and $(\mathscr{B}_{n_{k}})_{k\leq-1}$ . Therefore (22) holds for every $k\leq-1$ . It follows then that $\mathcal{P}_{0}$ is measurable with respect to

\bigvee_{j\leq-1}\mathscr{B}_{n_{j}}.

Since the $\mathscr{B}_{n_{j}}$ are factor $\sigma$ -algebras, the full process $\mathcal{P}$ is also $\bigvee_{j\leq-1}\mathscr{B}_{n_{j}}$ -measurable. Finally, $\mathcal{P}$ generates $\mathscr{A}$ , so

\bigvee_{j\leq-1}\mathscr{B}_{n_{j}}=\mathscr{A}=\mathscr{F}_{0}\,\text{ mod }\mu.

Let $k\leq-1$ , and set $\mathscr{E}_{1}:=\bigvee_{j\leq k-1}\mathscr{B}_{n_{j}}$ and $\mathscr{E}_{2}:=\bigvee_{k\leq j\leq-1}\mathscr{B}_{n_{j}}$ . By construction, we have

\mathscr{E}_{1}\subset\mathscr{F}_{n_{k}},\;\mathscr{F}_{n_{k}}\perp\!\!\!\perp\mathscr{E}_{2},\;\text{ and }\;\mathscr{F}_{0}=\mathscr{E}_{1}\vee\mathscr{E}_{2}.

We use this to see that if $f$ is $\mathscr{F}_{n_{k}}$ -measurable, we have

f=\mathbb{E}[f\,|\,\mathscr{F}_{0}]=\mathbb{E}[f\,|\,\mathscr{E}_{1}\vee\mathscr{E}_{2}]=\mathbb{E}[f\,|\,\mathscr{E}_{1}],

which proves that

\mathscr{F}_{n_{k}}=\mathscr{E}_{1}=\bigvee_{j\leq k-1}\mathscr{B}_{n_{j}}\;\text{ mod }\mu.

∎

4 Examples of weak Pinsker filtrations generated by a cellular automaton

Up to this point, we have discussed the existence and abstract properties of weak Pinsker filtrations. Now we want to give explicit examples to get a more concrete idea of what those objects can look like. We take inspiration from [10] and use cellular automata to generate our filtrations. We describe in the following paragraphs how this is done.

Let $A$ be a finite alphabet. A cellular automaton (or, more precisely, a deterministic cellular automaton) $\tau:A^{\mathbb{Z}}\rightarrow A^{\mathbb{Z}}$ maps $A^{\mathbb{Z}}$ onto itself as follows: take $F\subset\mathbb{Z}$ finite, which we call a neighborhood, and a local map $\tau_{0}:A^{F}\rightarrow A$ . Then define

\tau:(a_{n})_{n\in\mathbb{Z}}\mapsto(\tau_{0}((a_{n+k})_{k\in F}))_{n\in\mathbb{Z}}.

Here, we will only consider examples in which $F=\{0,1\}$ . Therefore, our automata will be determined by a local map of the form $\tau_{0}:A^{\{0,1\}}\rightarrow A$ . One can note that, by construction, cellular automata commute with the shift transformation

S:(a_{n})_{n\in\mathbb{Z}}\mapsto(a_{n+1})_{n\in\mathbb{Z}}.

So we can consider a dynamical system of the form $\mathbf{Y}:=(A^{\mathbb{Z}},\mathscr{B},\nu,S)$ where $\nu$ is a $S$ -invariant measure, and note that the $\sigma$ -algebra $\sigma(\tau)$ generated by $\tau$ is a factor $\sigma$ -algebra. We can do better and iterate $\tau$ to generate a filtration:

\text{for }n\leq 0,\;\mathscr{F}_{n}:=\sigma(\tau^{|n|}).

In that case, each $\mathscr{F}_{n}$ is a factor $\sigma$ -algebra of $\mathbf{Y}$ , and therefore $\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0}$ is a dynamical filtration. So, we see that cellular automata give a natural way to construct dynamical filtrations.

In fact, the theory of dynamical filtrations we presented in Section 2.2 was initiated in [10] in the setting of filtrations generated by cellular automata. However, the automata studied there preserve the product measure, and therefore the entropy of the associated factor $\sigma$ -algebras $\mathscr{F}_{n}$ will be the same for every $n\leq 0$ . This prevents the filtration from being weak Pinsker.

Here, we will consider a different automaton: take $A$ a finite alphabet and assume that one element of $A$ is labeled << $0$ >>. Then define the following local map

\begin{array}[]{cccc}\tau_{0}:&A^{2}&\longrightarrow&A\\ &(\alpha_{1},\alpha_{2})&\mapsto&\left\{\begin{array}[]{ll}\alpha_{1}&\text{ if }\,\alpha_{1}=\alpha_{2}\\ 0&\text{ otherwise}.\end{array}\right.\end{array}

(25)

The associated automaton will eliminate isolated elements, replacing them with $0$ , and a maximal string of the form $\alpha\cdots\alpha\alpha$ is replaced with $\alpha\cdots\alpha 0$ . For example, if $A=\{0,1\}$ , this gives:

[Uncaptioned image]

Therefore, as we iterate the automaton, the proportion of << $0$ >> increases as all other elements are gradually replaced by << $0$ >>. Heuristically, this indicates that the entropy of the factor $\sigma$ -algebras $\sigma(\tau^{|n|})$ will go to zero as $n$ goes to infinity. But to state this rigorously, one need to specify the system $\mathbf{Y}:=(A^{\mathbb{Z}},\mathscr{B},\nu,S)$ on which we define $\mathscr{F}$ . More accurately, it is the alphabet $A$ and the measure $\nu$ that need to be specified. However, the entropy $h_{\nu}(\mathscr{F}_{n})$ goes to $0$ regardless of the choice of $A$ and $\nu$ :

Proposition 4.1.

Let $\mathbf{Y}:=(A^{\mathbb{Z}},\mathscr{B},\nu,S)$ , where $\nu$ is a $S$ -invariant measure and let $\xi$ be the coordinate process on $\mathbf{Y}$ . For every $n\geq 1$ , we have

h_{\nu}(\tau^{n}\xi,S)\leq\frac{\log(\#An^{2})}{n}.

Proof.

Let $B_{n}\subset A^{n}$ be the set of values taken by $(\tau^{n}\xi)_{[0,n[}$ . We know that

H_{\nu}((\tau^{n}\xi)_{[0,n[})\leq\log(\#B_{n}).

Because of the structure of $\tau$ , in $\tau^{n}\xi$ , for $\alpha\neq 0$ , any run of << $\alpha$ >> is placed in between two runs of << $0$ >> of length at least $n+1$ . Therefore, $(\tau^{n}\xi)_{[0,n[}$ is either a sequence of << $0$ >> or composed of one run of << $\alpha$ >> (with $\alpha\neq 0$ ) in between runs of << $0$ >>. So

\#B_{n}\leq 1+(\#A-1)n^{2}\leq\#An^{2}.

In conclusion

h_{\nu}(\tau^{n}\xi,S)\leq\frac{1}{n}H_{\nu}((\tau^{n}\xi)_{[0,n[})\leq\frac{\log(\#An^{2})}{n}.

∎

In Section 4.1, we deal with the case where $\mathbf{Y}$ is a Bernoulli shift, and in Section 4.2, we deal with the case where $\mathbf{Y}$ is Ornstein’s example of a non-Bernoulli K-system from [15]. In both cases, by Proposition 4.1, the entropy of the filtration generated by the cellular automaton goes to zero. Then we look at each example separately to show the more involved result: each $\mathscr{F}_{n+1}$ is relatively Bernoulli over $\mathscr{F}_{n}$ . Therefore, we get two examples of weak Pinsker filtrations.

It is interesting to note that those two filtrations are very similar in their construction, but the filtration (or any sub-sequence) on Ornstein’s K-system cannot be of product type (otherwise, the system would be Bernoulli), we know from Theorem 3.1 that the latter has a sub-sequence that is of product type. It shows that there can be subtle differences in the asymptotic structure of weak Pinsker filtrations.

4.1 A cellular automaton on a Bernoulli shift

Here, we consider a Bernoulli shift $\mathbf{Y}:=(A^{\mathbb{Z}},\mathscr{B},\nu,S)$ where $\nu$ is a product measure. To avoid unnecessarily complicated notations, we will also assume that $A=\{0,1\}$ and $\nu:=(\frac{1}{2}(\delta_{0}+\delta_{1}))^{\otimes\mathbb{Z}}$ . Therefore, the local function (25) becomes:

\begin{array}[]{cccc}\tau_{0}:&\{0,1\}^{2}&\longrightarrow&\{0,1\}\\ &\alpha&\mapsto&\left\{\begin{array}[]{ll}1&\text{ if }\,\alpha=(1,1)\\ 0&\text{ otherwise}.\end{array}\right.\end{array}

And we study the corresponding automaton:

\begin{array}[]{cccc}\tau:&\{0,1\}^{\mathbb{Z}}&\longrightarrow&\{0,1\}^{\mathbb{Z}}\\ &(a_{n})_{n\in\mathbb{Z}}&\mapsto&(\tau_{0}(a_{n},a_{n+1}))_{n\in\mathbb{Z}}\\ \end{array}

The automaton replaces an isolated << $1$ >> with a << $0$ >> and reduces sequences of << $1$ >> by replacing the final one by a << $0$ >>.

Theorem 4.2.

On the system $\mathbf{Y}:=(\{0,1\}^{\mathbb{Z}},\mathscr{B},\nu,S)$ , the filtration given by $\mathscr{F}:=(\sigma(\tau^{|n|}))_{n\leq 0}$ is a weak Pinsker filtration. That is, for every $n\leq-1$ , $\mathscr{F}_{n+1}$ is relatively Bernoulli over $\mathscr{F}_{n}$ and we have

h_{\nu}(\mathscr{F}_{n})\underset{n\rightarrow-\infty}{\longrightarrow}0.

(26)

The convergence of the entropy follows from Proposition 4.1. However, when $\mathbf{Y}$ is a Bernoulli shift, we can compute a better bound, as stated in Proposition 4.3.

Proposition 4.3.

Let $\xi$ denote the coordinate process on $\mathbf{Y}$ . For every $n\geq 0$ , we have

h_{\nu}(\tau^{n}\xi,S)\leq 3\log(2)2^{-n/2}.

Proof.

Let $n\geq 0$ . One can see that $\tau^{n}\xi$ is $1$ at $i$ if and only if $\xi$ is $1$ over the entire segment $[i,i+n]$ , as shown below:

[Uncaptioned image]

We set $k_{n}:=\lceil n/2\rceil$ , and we remark that

\nu(\{\exists i\in[0,k_{n}],(\tau^{n}\xi)_{i}=1\})\leq\nu(\{\xi_{[k_{n},n]}=(1,...,1)\})\leq 1/2^{n-k_{n}+1}\leq 1/2^{n/2}.

Then, combining this with Fano’s inequality (Lemma 2.5) we get

H_{\nu}((\tau^{n}\xi)_{[0,k_{n}]})\leq 2^{-n/2}(1+\log(2^{k_{n}+1})+\log(2^{n/2}))\leq 2^{-n/2}3(k_{n}+1)\log(2),

and we can conclude for the KS-entropy:

h_{\nu}(\tau^{n}\xi,S)\leq\frac{1}{k_{n}+1}H_{\nu}((\tau^{n}\xi)_{[0,k_{n}]})\leq 3\log(2)2^{-n/2}.

∎

In addition, we give the following simple lemma on conditional independence:

Lemma 4.4.

Let $(X,\mathscr{A},\mu)$ be a probability space and $\mathscr{Z}$ a sub- $\sigma$ -algebra. Let $A$ , $B$ , $U$ and $V$ be random variables such that

(A,U)\perp\!\!\!\perp_{\mathscr{Z}}(B,V).

Then we have

	$\displaystyle\mathcal{L}(A,B\,\|\,U,V,\mathscr{Z})$	$\displaystyle=\mathcal{L}(A\,\|\,U,\mathscr{Z})\otimes\mathcal{L}(B\,\|\,V,\mathscr{Z})$
		$\displaystyle=\mathcal{L}(A\,\|\,U,V,\mathscr{Z})\otimes\mathcal{L}(B\,\|\,U,V,\mathscr{Z})$

Proof.

It follows from the fact that if $A^{\prime}$ , $B^{\prime}$ , $U^{\prime}$ and $V^{\prime}$ are respectively $A$ , $B$ , $U$ and $V$ -measurable random variables:

	$\displaystyle\mathbb{E}[A^{\prime}\cdot B^{\prime}\cdot U^{\prime}\cdot V^{\prime}\,\|\,\mathscr{Z}]$	$\displaystyle=\mathbb{E}[A^{\prime}\cdot U^{\prime}\,\|\,\mathscr{Z}]\cdot\mathbb{E}[B^{\prime}\cdot V^{\prime}\,\|\,\mathscr{Z}]$
		$\displaystyle=\mathbb{E}[\mathbb{E}[A^{\prime}\,\|\,U,\mathscr{Z}]\cdot U^{\prime}\,\|\,\mathscr{Z}]\cdot\mathbb{E}[\mathbb{E}[B^{\prime}\,\|\,V,\mathscr{Z}]\cdot V^{\prime}\,\|\,\mathscr{Z}]$
		$\displaystyle=\mathbb{E}[\mathbb{E}[A^{\prime}\,\|\,U,\mathscr{Z}]\cdot U^{\prime}\cdot\mathbb{E}[B^{\prime}\,\|\,V,\mathscr{Z}]\cdot V^{\prime}\,\|\,\mathscr{Z}].$

∎

Proposition 4.5.

Let $\xi$ be the coordinate process on $\mathbf{Y}$ . For every $n\geq 1$ , $\xi$ is relatively very weak Bernoulli over $\tau^{n}\xi$ .

Proof.

Set $\eta:=\tau^{n}\xi$ . Relative very weak Bernoullicity was defined in Definition 2.15. We recall some notation: take $\lambda\in\mathscr{P}(\{0,1\}^{\mathbb{Z}}\times\{0,1\}^{\mathbb{Z}})$ to be the law of $(\eta,\xi)$ , and for $I,J\subset\mathbb{Z}$ and $a,b\in\{0,1\}^{\mathbb{Z}}$ , $\lambda_{\ell}(\cdot\,|\,a_{I},b_{J})$ is the conditional law of $\xi_{[0,\ell[}$ given that $\eta_{I}=a_{I}$ and $\xi_{J}=b_{J}$ .

Let $\varepsilon>0$ . We need to show that there exists $\ell\geq 1$ such that for every $m\geq 1$ and for $k\geq 1$ large enough, we have

\int\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)d\lambda(a,b)\leq\varepsilon.

(27)

Let $m\geq 1$ . We start by noting that there must be some << $1$ >> that appears in $\eta$ : indeed, the law of large numbers tells us that there exists $\ell_{0}\geq 1$ such that

\mu(\underset{:=A}{\underbrace{\{\exists i\in[0,\ell_{0}[\,;\;\eta_{i}=1\}}})\geq 1-\varepsilon.

(28)

We then set $\ell:=\lceil\frac{1}{\varepsilon}\rceil\ell_{0}$ . Next, we take $k\geq\ell_{0}$ so that $\eta_{[-k,k]}$ determines entirely $A$ .

We fix $i\in[0,\ell_{0}[$ . First, we note that, as we can see on the following image

[Uncaptioned image]

if $\eta_{i}=1$ , then $(\xi_{]-\infty,i[},\eta_{]-\infty,i[})$ is $\xi_{]-\infty,i[}$ -measurable and $(\xi_{]i,\infty[},\eta_{]i,\infty[})$ is $\xi_{]i+n,\infty[}$ -measurable. So, since the variables $\{\xi_{j}\}_{j\in\mathbb{Z}}$ are independent, given $\{\eta_{i}=1\}$ the variables $(\xi_{]-\infty,i[},\eta_{]-\infty,i[})$ and $(\xi_{]i,\infty[},\eta_{]i,\infty[})$ are independent. Finally, using Lemma 4.4, for $a\in A^{\mathbb{Z}}$ such that $a_{i}=1$ , we get:

	$\displaystyle\mathcal{L}(\xi_{]-\infty,i[},\xi_{]i,\infty[}\,\|\,$	$\displaystyle\eta_{[-k,k]}=a_{[-k,k]})=\mathcal{L}(\xi_{]-\infty,i[}\,\|\,\eta_{[-k,i[}=a_{[-k,i[},\eta_{i}=1)$
		$\displaystyle\hskip 85.35826pt\otimes\mathcal{L}(\xi_{]i,\infty[}\,\|\,\eta_{]i,k]}=a_{]i,k]},\eta_{i}=1)$
		$\displaystyle=\mathcal{L}(\xi_{]-\infty,i[}\,\|\,\eta_{[-k,k]}=a_{[-k,k]})\otimes\mathcal{L}(\xi_{]i,\infty[}\,\|\,\eta_{]i,k]}=a_{[-k,k]}).$

Therefore, if $\eta_{[-k,k]}$ is chosen so that there exists $i\in[0,\ell_{0}[$ such that $\eta_{i}=1$ , we see that $\xi_{[-m,0[}$ and $\xi_{[\ell_{0},\ell[}$ are independent given $\eta_{[-k,k]}$ .

We are now ready to prove (27). For any $b\in\{0,1\}^{\mathbb{Z}}$ and any $a\in\{0,1\}^{\mathbb{Z}}$ such that there exists $i\in[0,\ell_{0}[$ such that $a_{i}=1$ , the fact that $\xi_{[-m,0]}$ and $\xi_{[\ell_{0},\ell[}$ are relatively independent given $\{\eta_{[-k,k]}=a_{[-k,k]}\}$ implies that the measures $\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]})$ and $\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})$ have the same marginal on the coordinates of $[\ell_{0},\ell[$ . So the relative product of those measures over $\xi_{[\ell_{0},\ell[}$ is a coupling under which the copies of $\xi_{[\ell_{0},\ell[}$ coincide. It follows that

\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)\leq\ell_{0}/\ell\leq\varepsilon.

(29)

By combining (28) and (29), we can conclude that

\int\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)d\lambda(a,b)\leq 2\varepsilon.

∎

Proof of Theorem 4.2.

First of all, (26) follows directly from Proposition 4.3. Next, from Proposition 4.5, it follows that $\mathscr{F}_{0}$ is relatively very weak Bernoulli over $\mathscr{F}_{n}$ , so $\mathscr{F}_{n+1}$ is relatively very weak Bernoulli over $\mathscr{F}_{n}$ (by part (iii) of Lemma 2.17), so $\mathscr{F}_{n+1}$ is relatively Bernoulli over $\mathscr{F}_{n}$ (by part (i) of Lemma 2.17). ∎

4.2 A cellular automaton on Ornstein’s K-process

Here, we consider the non-Bernoulli K-system introduced by Ornstein in [15]. A more detailed presentation of this system is given in [16, Part III], but we give a sketch of the construction for completeness. It is a process defined on the alphabet $\{0,e,f,s\}$ . We set $h(r)$ , $s(r)$ and $f(r)$ to be integers depending on $r\in\mathbb{N}$ used in the construction of the process. For $r\geq 1$ , an $r$ -block is a random sequence of length $h(r)$ on the alphabet $\{0,e,f,s\}$ , whose law we define inductively.

To get a $1$ -block, take $k_{1}\in\llbracket 1,f(1)-1\rrbracket$ chosen uniformly at random, and consider a sequence that starts with a string of $k_{1}$ << $f$ >>, followed by a string of $h(0)$ << $0$ >>, and ends with a string of $f(1)-k_{1}$ << $e$ >>:

This construction implies that $h(1)=h(0)+f(1)$ .

To get an $r$ -block, take $k_{r}\in\llbracket 1,f(r)-1\rrbracket$ chosen uniformly at random, and $2^{r}$ i.i.d. random variables $(\xi^{(r-1)}_{i})_{i\in\llbracket 1,2^{r}\rrbracket}$ such that each $\xi^{(r-1)}_{i}$ is an $(r-1)$ -block. The $r$ -block is then built as follows:

So an $r$ -block starts with a string of $k_{r}$ << $f$ >>, and ends with a string of $f(r)-k_{r}$ << $e$ >>. In between, we put all the $(r-1)$ -blocks separated by strings of << $s$ >> so that each $\xi_{i}^{(r-1)}$ is placed in between two strings of << $s$ >> of respective lengths $is(r)$ and $(i+1)s(r)$ . In particular, $h(r)$ is entirely determined by $h(r-1)$ , $f(r)$ and $s(r)$ .

Ornstein’s K-system is then built by constructing an increasing sequence of towers $(\mathcal{T}_{r})_{r\geq 1}$ such that $X:=\bigcup_{r\geq 1}\mathcal{T}_{r}$ . A tower $\mathcal{T}_{r}$ is given by its base $F_{r}$ for which the sets $\{T^{i}F_{r}\}_{i\in[0,h(r)[}$ are disjoint and

\mathcal{T}_{r}:=\bigsqcup_{i=0}^{h(r)-1}T^{i}F_{r}.

Through a cutting and stacking method, Ornstein builds in [15] the towers $(\mathcal{T}_{r})_{r\geq 1}$ along with a process $\xi$ so that the law of $\xi_{[0,h(r)[}$ given $F_{r}$ is the law of an $r$ -block. In other words, this means that the columns of the form

C_{\mathbf{a}}:=\bigsqcup_{i=0}^{h(r)-1}T^{i}(F_{r}\cap\{\xi_{[0,h(r)[}=\mathbf{a}\}),\,\text{ for }\mathbf{a}\in\{0,e,f,s\}^{h(r)},

partition $\mathcal{T}_{r}$ according to the law of an $r$ -block. Denote $\mathbf{X}:=(X,\mathscr{A},\mu,T)$ the resulting dynamical system. A proper choice of $h(r)$ , $s(r)$ and $f(r)$ assures that this construction gives a finite measure. Then $\xi$ is a factor map onto the system

\mathbf{Y}:=(\{0,e,f,s\}^{\mathbb{Z}},\mathscr{B},\nu,S),

where $\nu$ is the law of $\xi$ .

Since $\xi$ is a process on the alphabet $\{0,e,f,s\}$ , the local function (25) becomes:

\begin{array}[]{cccc}\tau_{0}:&\{0,e,f,s\}^{2}&\longrightarrow&\{0,e,f,s\}\\ &(\alpha_{1},\alpha_{2})&\mapsto&\left\{\begin{array}[]{ll}\alpha_{1}&\text{ if }\,\alpha_{1}=\alpha_{2}\\ 0&\text{ otherwise}.\end{array}\right.\end{array}

From now on, $\tau$ denotes the corresponding cellular automaton. Similarly to what we did in Section 4.1, we prove

Theorem 4.6.

On the system $\mathbf{Y}:=(\{0,e,f,s\}^{\mathbb{Z}},\mathscr{B},\nu,S)$ , the filtration given by $\mathscr{F}:=(\sigma(\tau^{|n|}))_{n\leq 0}$ is a weak Pinsker filtration. That is, for every $n\leq-1$ , $\mathscr{F}_{n+1}$ is relatively Bernoulli over $\mathscr{F}_{n}$ and we have

h_{\nu}(\mathscr{F}_{n})\underset{n\rightarrow-\infty}{\longrightarrow}0.

(30)

The overall structure of the proof will resemble Section 4.1, but the details are adapted to the specific structure of Ornstein’s process. First, the convergence to $0$ of the entropy follows from Proposition 4.1. We could also adapt the proof of Proposition 4.3 to get that convergence, but it does not give a better rate of convergence than Proposition 4.1, so we do not give any details.

Proposition 4.7.

If $\xi$ is the process defined above, then for every $n\geq 1$ , $\xi$ is relatively very weak Bernoulli over $\tau^{n}\xi$ .

Proof.

We set $\eta:=\tau^{n}\xi$ . Let $\varepsilon>0$ . Once again, we need to show that there exists $\ell\geq 1$ such that for every $m\geq 1$ and for $k\geq 1$ large enough, we have

\int\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)d\lambda(a,b)\leq\varepsilon,

where $\lambda$ is the law of $(\eta,\xi)$ and, for $I,J\subset\mathbb{Z}$ , $\lambda_{\ell}(\cdot\,|\,a_{I},b_{J})$ is the conditional law of $\xi_{[0,\ell[}$ given that $\eta_{I}$ equals $a_{I}$ and that $\xi_{J}$ equals $b_{J}$ .

Let $m\geq 1$ . We choose $r$ so that $s(r+1)\geq n+1$ . By construction of $\xi$ , we know that for any $r$ -block in $\xi$ , there exists $i\in[1,2^{r+1}]$ such that the said $r$ -block will come after a string of $i\cdot s(r+1)$ << $s$ >> and be followed by a string of $(i+1)\cdot s(r+1)$ << $s$ >>. Therefore, by knowing the positions of all the strings of << $s$ >> longer that $s(r+1)$ , we know the position of every $r$ -block.

However, since we chose to have $s(r+1)\geq n+1$ , we can say that, for $k\in\mathbb{Z}$ , we have $\xi_{[k,k+s(r+1)[}=(s,...,s)$ if and only if $\eta_{[k,k+s(r+1)-n[}=(s,...,s)$ . This means that the positions of the $r$ -blocks contained on a segment $[k_{1},k_{2}]$ are $\eta_{[k_{1}-N,k_{2}+N]}$ -measurable, for $N$ large enough (for example $N=(2^{r+1}+1)s(r+1)$ ).

By choosing $r$ large enough, we can also assume that $\mu(\mathcal{T}_{r})\geq 1-\varepsilon/2$ . Using Birkhoff’s ergodic theorem, for $\ell$ large enough, the set

A:=\left\{x\in X;\,\frac{1}{\ell}\sum_{j=0}^{\ell-1}\mathbbm{1}_{\mathcal{T}_{r}}(T^{j}(x))>1-\varepsilon\right\},

satisfies $\mu(A)>1-\varepsilon$ .

In other words, for $x\in A$ , the number of elements in the sequence $\xi_{[0,\ell[}(x)$ that are part of an $r$ -block is greater than $(1-\varepsilon)\ell$ . However, among the intervals on which those $r$ -blocks are supported, two of them may not be included in $[0,\ell[$ , and can intersect $\mathbb{Z}\backslash[0,\ell[$ . But, if $h(r)/\ell\leq\varepsilon/2$ , there are at most $\varepsilon\ell$ elements in those two intervals. To sum up, we get that the number of elements in the sequence $\xi_{[0,\ell[}(x)$ that are part of an $r$ -block, and for which the position of that $r$ -block is contained on the segment $[0,\ell[$ , is greater than $(1-2\varepsilon)\ell$ . Then, we choose $k\geq 1$ so that the positions of the $r$ -blocks contained in $[-m,\ell[$ are $\eta_{[-k,k]}$ -measurable (in particular, $A$ is $\eta_{[-k,k]}$ -measurable). So we have the following configuration for $\xi_{[-m,\ell[}$ :

[Uncaptioned image]

where the $B_{i}$ are the positions of the $r$ -blocks supported on $[0,\ell[$ , and we have shown that $\#\bigsqcup_{i=1}^{p}B_{i}\geq(1-2\varepsilon)\ell.$

Denote by $I_{\ell}:=\{B_{i}\}_{1\leq i\leq p}$ the random variable that gives the positions of the $r$ -blocks on the segment $[0,\ell[$ . By construction of $\xi$ , we know that, given $I_{\ell}$ , for any $r$ -block $B$ , the variables $\xi_{B}$ and $\xi_{B^{c}}$ are independent. Moreover, we know that any $r$ -block is between two strings of at least $n+1$ << $s$ >>. Therefore, we see that if $I_{\ell}$ is fixed, for any $r$ -block $B$ , $\eta_{B}$ is $\xi_{B}$ -measurable and $\eta_{B^{c}}$ is $\xi_{B^{c}}$ -measurable.

Let us give details on the proof of that last claim: we write $B^{c}$ as the union of $B^{-}$ and $B^{+}$ , the infinite intervals that come before and after $B$ respectively. Given the structure of our automaton, it is always true that $\eta_{B^{+}}$ is $\xi_{B^{+}}$ -measurable. At the boundary between $B^{-}$ and $B$ , we have the following configuration:

[Uncaptioned image]

Indeed, in the construction of the blocks, we see that $\xi$ must put an << $f$ >> in the first box of $B$ . Therefore, we must have << $0$ >> in the red boxes. So, the values that $\eta$ takes on the $n+1$ boxes preceding $B$ are determined. For the rest of the boxes of $B^{-}$ , it comes from the structure of $\tau$ that the values of $\eta$ are determined by $\xi_{B^{-}}$ since we are at a distance $\geq n+1$ from $B$ . So we have shown that $\eta_{B^{-}}$ is $(\xi_{B^{-}})$ -measurable. A similar reasoning at the boundary between $B$ and $B^{+}$ shows that $\eta_{B}$ is $\xi_{B}$ -measurable. And since it is always true that $\eta_{B^{+}}$ is $\xi_{B^{+}}$ -measurable, we have proven that $\eta_{B}$ is $\xi_{B}$ -measurable and $\eta_{B^{c}}$ is $\xi_{B^{c}}$ -measurable.

But, we also know from the structure of $\xi$ that, given $I_{\ell}$ , $\xi_{B}$ and $\xi_{B^{c}}$ are independent. The previous paragraph enables us to use Lemma 4.4 to extend that to: given $I_{\ell}\vee\eta_{[-k,k]}$ , $\xi_{B}$ and $\xi_{B^{c}}$ are independent. Finally, since $I_{\ell}$ is $\eta_{[-k,k]}$ -measurable, this yields that $\xi_{B}$ and $\xi_{B^{c}}$ are relatively independent given $\eta_{[-k,k]}$ .

This independence tells us that, for every sequences $a$ and $b$ , $\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]})$ and $\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})$ have the same marginals on the coordinates of the $r$ -blocks $B$ contained in $[0,\ell[$ . Moreover, if $a$ is chosen so that $\{\eta_{[-k,k]}=a_{[-k,k]}\}$ is a subset of $A$ , we know that the positions of the $r$ -blocks cover at least $(1-2\varepsilon)\ell$ elements in $[0,\ell[$ . Then, by considering the relative product of $\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]})$ and $\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})$ over $\{\xi_{B_{i}}\}_{1\leq i\leq p}$ , we get:

\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)\leq 2\varepsilon.

Finally, since $\mu(A)\geq 1-\varepsilon$ , this yields

\int\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)d\nu(a,b)\leq 3\varepsilon.

∎

Remark 4.8.

We see that the proofs of Theorem 4.6 and Theorem 4.2 are very similar. In both cases, we have a process $\xi$ , whose conditional law given $\tau^{n}\xi$ is made of random blocks separated by deterministic blocks, and the random blocks are filled independently from each other. The main difference that prevents Ornstein’s K-process from being Bernoulli is that the position of $r$ -blocks is determined by the long sequences of << $s$ >>, and this creates correlations over long distances. But once we condition by $\tau^{n}\xi$ , those sequences of << $s$ >> are entirely determined. Therefore we are left with filling independently all the $r$ -blocks, and the past has no longer a significant influence on the future.

In that sense, when we look at the relative structure of Ornstein’s K-process over $\tau^{n}$ , the non-Bernoulli aspects disappear. However, when we look at the asymptotic properties of the weak Pinsker filtration obtained by applying $\{\tau^{n}\}_{n\geq 1}$ , whether we start with a Bernoulli process or with a non-Bernoulli K-process, we get different results. Therefore, getting a better understanding of the classification of the various properties of weak Pinsker filtrations could help to develop a new classification of non-Bernoulli K-systems.

Acknowledgements

The author thanks Jean-Paul Thouvenot for the fruitful discussions, the insights and the ongoing exchanges regarding the present work.

References

[1] Tim Austin. Measure concentration and the weak Pinsker property. Publ. Math., Inst. Hautes Étud. Sci., 128:1–119, 2018.
[2] Séverin Benzoni. Confined extensions and non-standard dynamical filtrations. Stud. Math., 276(3):233–270, 2024.
[3] P. Billingsley. Ergodic theory and information. John Wiley & Sons, Hoboken, NJ, 1965.
[4] I. P. Cornfeld, S. V. Fomin, and Ya. G. Sinai. Ergodic theory. Transl. from the Russian by A. B. Sossinskii., volume 245. Springer, Berlin, 1982.
[5] M. Émery and W. Schachermayer. On Vershik’s standardness criterion and Tsirelson’s notion of cosiness. In Séminaire de Probabilités XXXV, pages 265–305. Berlin: Springer, 2001.
[6] Eli Glasner. Ergodic theory via joinings, volume 101. Providence, RI: American Mathematical Society (AMS), 2003.
[7] Marshall Hall, Jr. Combinatorial theory. Wiley Classics Library. John Wiley & Sons, Inc., New York, second edition, 1998. A Wiley-Interscience Publication.
[8] W. Krieger. On entropy and generators of measure-preserving transformations. Trans. Am. Math. Soc., 149:453–464, 1970.
[9] Paul Lanthier. Aspects ergodiques et algébriques des automates cellulaires. PhD thesis, Université de Rouen Normandie, 2020.
[10] Paul Lanthier and Thierry De La Rue. Classification of backward filtrations and factor filtrations: examples from cellular automata. Ergodic Theory and Dynamical Systems, 42(9):2890–2922, 2022.
[11] D. Ornstein. Bernoulli shifts with the same entropy are isomorphic. Adv. Math., 4:337–352, 1970.
[12] Donald Ornstein. Two Bernoulli shifts with infinite entropy are isomorphic. Adv. Math., 5:339–348, 1971.
[13] Donald S. Ornstein. A K-automorphism with no square root and Pinsker’s conjecture. Adv. Math., 10:89–102, 1973.
[14] Donald S. Ornstein. A mixing transformation for which Pinsker’s conjecture fails. Adv. Math., 10:103–123, 1973.
[15] Donald S. Ornstein. An example of a Kolmogorov automorphism that is not a Bernoulli shift. Adv. Math., 10:49–62, 1973.
[16] Donald S. Ornstein. Ergodic theory, randomness, and dynamical systems. Yale Mathematical Monographs. 5. New Haven - London: Yale University Press. VII, 141 p. £2.50 (1974)., 1974.
[17] Donald S. Ornstein and Benjamin Weiss. Finitely determined implies very weak Bernoulli. Isr. J. Math., 17:94–104, 1974.
[18] William Parry. Topics in ergodic theory. Reprint of the 1981 original, volume 75 of Camb. Tracts Math. Cambridge, MA: Cambridge University Press, 2004.
[19] M. S. Pinsker. Dynamical systems with completely positive or zero entropy. Sov. Math., Dokl., 1:937–938, 1960.
[20] M. Rahe. Relatively finitely determined implies relatively very weak Bernoulli. Can. J. Math., 30:531–548, 1978.
[21] C. E. Shannon. A mathematical theory of communication. Bell System Tech. J., 27:379–423, 623–656, 1948.
[22] Paul Shields. The theory of Bernoulli shifts. Chicago Lectures in Mathematics. Chicago - London: The University of Chicago Press. X, 118 p. £1.35; cloth £3.55 (1973)., 1973.
[23] Ja. G. Sinaĭ. On a weak isomorphism of transformations with invariant measure. Mat. Sb. (N.S.), 63 (105):23–42, 1964.
[24] J.-P. Thouvenot. On the stability of the weak Pinsker property. Isr. J. Math., 27:150–162, 1977.
[25] J.-P. Thouvenot. Two facts concerning the transformations which satisfy the weak Pinsker property. Ergodic Theory Dyn. Syst., 28(2):689–695, 2008.
[26] Jean-Paul Thouvenot. Quelques propriétés des systèmes dynamiques qui se decomposent en un produit de deux systèmes dont l’un est un schema de Bernoulli. Isr. J. Math., 21:177–207, 1975.
[27] Jean-Paul Thouvenot. Entropy, isomorphism and equivalence in ergodic theory. In Handbook of dynamical systems. Volume 1A, pages 205–238. Amsterdam: North-Holland, 2002.

	$\displaystyle\bar{d}_{\ell}(\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,\|\,\mathcal{Q}=\boldsymbol{h})$	$\displaystyle,\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,\|\,\mathcal{Q}=\boldsymbol{h}_{0}))$
		$\displaystyle\leq\frac{n}{n_{1}-2n_{0}}\bar{d}_{n}(\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,\|\,\mathcal{Q}=\boldsymbol{h}),\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,\|\,\mathcal{Q}=\boldsymbol{h}_{0}))$
		$\displaystyle\leq\frac{n}{(1-\gamma)n-n_{0}}2\varepsilon^{3}/3<\varepsilon^{3},$

	$\displaystyle\varepsilon^{3}\geq$	$\displaystyle\int d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\geq\int_{\boldsymbol{p}_{1}\notin\mathcal{A}_{\ell}}d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2})$
		$\displaystyle=\int_{\boldsymbol{p}\notin\mathcal{A}_{\ell}}\int d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2}\,\|\,\boldsymbol{p}_{1}=\boldsymbol{p})d\lambda_{1}(\boldsymbol{p})$
		$\displaystyle\geq\int_{\boldsymbol{p}\notin\mathcal{A}_{\ell}}\lambda(d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})>\varepsilon\,\|\,\boldsymbol{p}_{1}=\boldsymbol{p})\cdot\varepsilon d\lambda_{1}(\boldsymbol{p})$
		$\displaystyle>\varepsilon^{2}\mu(\tilde{\mathcal{P}}_{L_{n}}\notin\mathcal{A}_{\ell}\,\|\,\mathcal{Q}=\boldsymbol{h}),$

	$\displaystyle\#I$	$\displaystyle\leq 2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\xi_{[0,n_{1}[}\in I,\xi_{[0,n[}\in\mathcal{B}_{n})$
		$\displaystyle\leq 2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\xi_{[0,n_{1}[}\in I\,\|\,\xi_{[0,n[}\in\mathcal{B}_{n})$
		$\displaystyle=2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\xi_{[0,n_{1}[}\in I\,\|\,\mathcal{Q}=\boldsymbol{h}),\,\text{ because }\xi\perp\!\!\!\perp\mathcal{H}$
		$\displaystyle\leq 2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\tilde{\mathcal{P}}_{L_{n}}\in K\,\|\,\mathcal{Q}=\boldsymbol{h}),$

	$\displaystyle\mathbb{E}[A^{\prime}\cdot B^{\prime}\cdot U^{\prime}\cdot V^{\prime}\,\|\,\mathscr{Z}]$	$\displaystyle=\mathbb{E}[A^{\prime}\cdot U^{\prime}\,\|\,\mathscr{Z}]\cdot\mathbb{E}[B^{\prime}\cdot V^{\prime}\,\|\,\mathscr{Z}]$
		$\displaystyle=\mathbb{E}[\mathbb{E}[A^{\prime}\,\|\,U,\mathscr{Z}]\cdot U^{\prime}\,\|\,\mathscr{Z}]\cdot\mathbb{E}[\mathbb{E}[B^{\prime}\,\|\,V,\mathscr{Z}]\cdot V^{\prime}\,\|\,\mathscr{Z}]$
		$\displaystyle=\mathbb{E}[\mathbb{E}[A^{\prime}\,\|\,U,\mathscr{Z}]\cdot U^{\prime}\cdot\mathbb{E}[B^{\prime}\,\|\,V,\mathscr{Z}]\cdot V^{\prime}\,\|\,\mathscr{Z}].$

	$\displaystyle\mathcal{L}(\xi_{]-\infty,i[},\xi_{]i,\infty[}\,\|\,$	$\displaystyle\eta_{[-k,k]}=a_{[-k,k]})=\mathcal{L}(\xi_{]-\infty,i[}\,\|\,\eta_{[-k,i[}=a_{[-k,i[},\eta_{i}=1)$
		$\displaystyle\hskip 85.35826pt\otimes\mathcal{L}(\xi_{]i,\infty[}\,\|\,\eta_{]i,k]}=a_{]i,k]},\eta_{i}=1)$
		$\displaystyle=\mathcal{L}(\xi_{]-\infty,i[}\,\|\,\eta_{[-k,k]}=a_{[-k,k]})\otimes\mathcal{L}(\xi_{]i,\infty[}\,\|\,\eta_{]i,k]}=a_{[-k,k]}).$

Introduction to weak Pinsker filtrations

Abstract

1 Introduction

2 Weak Pinsker filtrations and related questions

2.1 Basic notation

2.2 Dynamical filtrations

Definition 2.1.

Definition 2.2.

Definition 2.3.

2.3 Reminders on KS-entropy

Definition 2.4 (Shannon entropy).

Lemma 2.5 (Fano’s inequality).

Definition 2.6 (Kolmogorov-Sinaï entropy).

Lemma 2.7.

Proof.

Lemma 2.8.

Theorem 2.9 (Kolmogorov-Sinaï).

2.4 Ornstein’s theory and its relative version

Definition 2.10 (Bernoulli and relatively Bernoulli).

Remark 2.11.

Theorem 2.12 (Ornstein [11, 12]).

Definition 2.13 (Very weak Bernoulli).

Theorem 2.14 (see [16, 17]).

Definition 2.15 (Relatively very weak Bernoulli).

Theorem 2.16 (see [20]).

Lemma 2.17.

Proof.

Theorem 2.18 (See [8]).

2.5 Positive entropy systems and weak Pinsker filtrations

Theorem 2.19 (Austin, 2018, [1]).

Definition 2.20.

Proposition 2.21.

Proposition 2.22.

Proof.

Theorem 2.23.

Proof.

Question 2.24.

2.6 The uniqueness problem

Theorem 2.25 (From Thouvenot in [25]).

Proof.

Question 2.26.

Question 2.27.

3 Uniqueness problem on Bernoulli systems

Theorem 3.1.

Proposition 3.2.

3.1 The technical lemma

Definition 3.3.

Lemma 3.4 (Hall’s marriage lemma [7]).

Theorem 3.5.

Proposition 3.6 (See [22]).

Lemma 3.7.

Proof of Lemma 3.7.

Step 1: The setup of the tower

Step 2: Using the extremality of 𝒫\mathcal{P}

Step 3: Framework for the construction of ξ~0\tilde{\xi}_{0}

Step 4: Estimates for Hall’s marriage lemma

Step 5: Proving that ξ~\tilde{\xi} satsifies (i), (ii) and (iii)

3.2 Application of the technical lemma

Proposition 3.8.

Proposition 3.9.

Proof of Proposition 3.2.

3.3 Proof of Theorem 3.1

Proof of Theorem 3.1.

4 Examples of weak Pinsker filtrations generated by a cellular automaton

Proposition 4.1.

Proof.

4.1 A cellular automaton on a Bernoulli shift

Theorem 4.2.

Proposition 4.3.

Proof.

Lemma 4.4.

Proof.

Proposition 4.5.

Proof.

Proof of Theorem 4.2.

4.2 A cellular automaton on Ornstein’s K-process

Theorem 4.6.

Proposition 4.7.

Proof.

Remark 4.8.

Step 2: Using the extremality of $\mathcal{P}$

Step 3: Framework for the construction of $\tilde{\xi}_{0}$

Step 5: Proving that $\tilde{\xi}$ satsifies (i), (ii) and (iii)