When are Kalman-Filter Restless Bandits Indexable?

Christopher Dance and Tomi Silander
Xerox Research Centre Europe, Grenoble, France

(May, 2015)

Abstract

We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the Whittle index is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about Schur-convexity and mechanical words, which are particular binary strings intimately related to palindromes.

1 Introduction

We study the problem of monitoring several time series so as to maintain a precise belief while minimising the cost of sensing. Such problems can be viewed as POMDPs with belief-dependent rewards [3] and their applications include active sensing [7], attention mechanisms for multiple-object tracking [22], as well as online summarisation of massive data from time-series [4]. Specifically, we discuss the restless bandit [24] associated with the discrete-time Kalman filter [19].

Restless bandits generalise bandit problems [6, 8] to situations where the state of each arm (project, site or target) continues to change even if the arm is not played. As with bandit problems, the states of the arms evolve independently given the actions taken, suggesting that there might be efficient algorithms for large-scale settings, based on calculating an index for each arm, which is a real number associated with the (belief-)state of that arm alone. However, while bandits always have an optimal index policy (select the arm with the largest index), it is known that no index policy can be optimal for some discrete-state restless bandits [17] and such problems are in general PSPACE-hard even to approximate to any non-trivial factor [10]. Further, in this paper we address restless bandits with real-valued rather than discrete states.

On the other hand, Whittle proposed a natural index policy for restless bandits [24], but this policy only makes sense when the restless bandit is indexable, as we now explain. Say we have $n$ restless bandits and we are constrained to play $m$ arms at each time. Whittle considered relaxing this constraint by only requiring that the time-average number of arms played is $m$ . Now the optimal average cost for this relaxed problem is a lower bound on the optimal average cost for the original problem. Also, the relaxed problem can be separated into $n$ single-arm problems by the method of Lagrange multipliers, making it relatively easy to solve. In this separated version of the relaxed problem, each arm behaves identically to an arm in the original problem, except that an additional price $\lambda$ is charged each time the arm is played, where $\lambda$ corresponds to the Lagrange multiplier for the relaxed constraint. Now let us consider a family of optimal policies which achieves the optimal cost-to-go $Q_{i}(x,u;\lambda)$ for a single arm $i$ with price $\lambda$ and which takes actions $u=\pi_{i}(x;\lambda)$ when in state $x$ where $u=0$ means passive and $u=1$ means active. At first glance, we might intuitively suppose that it becomes less and less attractive to be active as the price $\lambda$ increases so that as the price is increased beyond some value $\lambda_{i}(x)$ , the optimal action switches from active to passive. At this price we are ambivalent between being active and passive so that $Q_{i}(x,0;\lambda_{i}(x))=Q_{i}(x,1;\lambda_{i}(x))$ . Such a value $\lambda_{i}(x)$ is called the Whittle index for arm $i$ in state $x$ . Indeed if there is a family of optimal policies for which

\displaystyle\pi_{i}(x;\lambda_{\text{hi}})\leq\pi_{i}(x;\lambda_{\text{lo}})\quad\text{for all states $x$ and all pairs of prices $\lambda_{\text{hi}}\geq\lambda_{\text{lo}}$}

then an optimal solution to the relaxed problem for price $\lambda$ is to activate arm $i$ if and only if $\lambda<\lambda_{i}(x)$ . If a restless bandit satisfies this condition, it is said to be indexable. It is important to note that some restless bandits are not indexable, so activating arm $i$ if and only if $\lambda<\lambda_{i}(x)$ does not correspond to an optimal solution to the relaxed problem. Indeed, in a study of small randomly-generated problems, Weber and Weiss [23] found that roughly 10% of problems were not indexable.

As a policy based on $\lambda_{i}(x)$ is so good for the relaxed problem when the arms are indexable, this motivates us to use $\lambda_{i}(x)$ as a heuristic for the original problem. This heuristic is called Whittle’s index policy and at each time it activates the $m$ arms with the highest indexes $\lambda_{i}(x)$ . Further motivation for studying indexability is that for ordinary bandits the Whittle index reduces to the Gittins index, making the Whittle index policy optimal when only one arm may be active at each time, that is when $m=1$ . More generally, Whittle’s index policy is not optimal for some restless bandit problems even when the arms are indexable, but indexability is still a rather useful concept, since if all arms are indexable and certain other conditions hold, Whittle’s policy is asymptotically optimal, as we now explain. Consider a sequence of restless bandit problems parameterised by the number of indexable arms $n$ and in which $m=\alpha n$ of the arms can be simultaneously active for some fixed $\alpha\in(0,1)$ . Then as $n$ tends to infinity, the time-average cost per arm for Whittle’s index policy converges to the time-average cost per arm for an optimal policy, provided a certain fluid approximation has a unique fixed point. This result was first demonstrated by Weber and Weiss [23] who for simplicity of exposition only considered the symmetric case in which the $n$ arms have identical costs and transition probabilities. Recently, Verloop [20] extended this result to asymmetric cases involving multiple types of arms. Interestingly, this extension also covers cases where new arms arrive and old arms depart.

Restless bandits associated with scalar Kalman(-Bucy) filters in continuous time were recently shown to be indexable [12] and the corresponding discrete-time problem has attracted considerable attention over a long period [15, 11, 16, 21]. However, that attention has produced no satisfactory proof of indexability – even for scalar time-series and even if we assume that there is a monotone optimal policy for the single-arm problem, which is a policy that plays the arm if and only if the relevant belief-state exceeds some threshold (here the relevant belief-state is a posterior variance). Theorem 1 of this paper addresses that gap. After formalising the problem (Section 2), we describe the concepts and intuition (Section 3) behind the main result (Section 4). The main tools are mechanical words (which are not sufficiently well-known) and Schur convexity. As these tools are associated with rather general theorems, we believe that future work (Section 5) should enable substantial generalisation of our results.

2 Problem and Index

We consider the problem of tracking $N$ time-series, which we call arms, in discrete time. The state $Z_{i,t}\in\mathbb{R}$ of arm $i$ at time $t\in\mathbb{Z}_{+}$ evolves as a standard-normal random walk independent of everything but its immediate past ( $\mathbb{Z}_{+},\mathbb{R}_{-}$ and $\mathbb{R}_{+}$ all include zero). The action space is $\mathcal{U}:=\{1,\dots,N\}$ . Action $u_{t}=i$ makes an expensive observation $Y_{i,t}$ of arm $i$ which is normally-distributed about $Z_{i,t}$ with precision $b_{i}\in\mathbb{R}_{+}$ and we receive cheap observations $Y_{j,t}$ of each other arm $j$ with precision $a_{j}\in\mathbb{R}_{+}$ where $a_{j}<b_{j}$ and $a_{j}=0$ means no observation at all.

Let $Z_{t},Y_{t},\mathcal{H}_{t},\mathcal{F}_{t}$ be the state, observation, history and observed history, so that $Z_{t}:=(Z_{1,t},\dots,Z_{N,t}),Y_{t}:=(Y_{1,t},\dots,Y_{N,t}),\mathcal{H}_{t}:=((Z_{0},u_{0},Y_{0}),\dots,(Z_{t},u_{t},Y_{t}))$ and $\mathcal{F}_{t}:=((u_{0},Y_{0}),\dots,(u_{t},Y_{t})).$ Then we formalise the above as ( ${\bf 1}_{\cdot}$ is the indicator function)

\displaystyle Z_{i,0}

\displaystyle\sim\mathcal{N}(0,1),

\displaystyle Z_{i,t+1}\mid\mathcal{H}_{t}

\displaystyle\sim\mathcal{N}(Z_{i,t},1),

\displaystyle Y_{i,t}\mid\mathcal{H}_{t-1},Z_{t},u_{t}

\displaystyle\sim\mathcal{N}\left(Z_{i,t},\frac{{\bf 1}_{u_{t}\neq i}}{a_{i}}+\frac{{\bf 1}_{u_{t}=i}}{b_{i}}\right).

Note that this setting is readily generalised to $\mathbb{E}[(Z_{i,t+1}-Z_{i,t})^{2}]\neq 1$ by a change of variables.

Thus the posterior belief is given by the Kalman filter as $Z_{i,t}\mid\mathcal{F}_{t}\sim\mathcal{N}(\hat{Z}_{i,t},x_{i,t})$ where the posterior mean is ${\hat{Z}}_{i,t}\in\mathbb{R}$ and the error variance $x_{i,t}\in\mathbb{R}_{+}$ satisfies

\displaystyle x_{i,t+1}=\phi_{i,{\bf 1}_{u_{t+1}=i}}(x_{i,t})\quad\text{where}\quad\phi_{i,0}(x):=\frac{x+1}{a_{i}x+a_{i}+1}\ \ \text{and}\ \ \phi_{i,1}(x):=\frac{x+1}{b_{i}x+b_{i}+1}.

(1)

Problem KF1. Let $\pi$ be a policy so that $u_{t}=\pi(\mathcal{F}_{t-1})$ . Let $x^{\pi}_{i,t}$ be the error variance under $\pi$ . The problem is to choose $\pi$ so as to minimise the following objective for discount factor $\beta\in[0,1)$ . The objective consists of a weighted sum of error variances $x^{\pi}_{i,t}$ with weights $w_{i}\in\mathbb{R}_{+}$ plus observation costs $h_{i}\in\mathbb{R}_{+}$ for $i=1,\dots,N$ :

\displaystyle\mathbb{E}\left[\sum_{t=0}^{\infty}\sum_{i=1}^{N}\beta^{t}\left\{h_{i}{\bf 1}_{u_{t}=i}+w_{i}x_{i,t}^{\pi}\right\}\right]=\sum_{t=0}^{\infty}\sum_{i=1}^{N}\beta^{t}\left\{h_{i}{\bf 1}_{u_{t}=i}+w_{i}x_{i,t}^{\pi}\right\}

where the equality follows as (1) is a deterministic mapping (and assuming $\pi$ is deterministic).

Single-Arm Problem and Whittle Index. Now fix an arm $i$ and write $x_{t}^{\pi},\phi_{0}(\cdot),\dots$ instead of $x_{t,i}^{\pi},\phi_{i,0}(\cdot),\dots$ . Say there are now two actions $u_{t}=0,1$ corresponding to cheap and expensive observations respectively and the expensive observation now costs $h+\nu$ where $\nu\in\mathbb{R}$ . The single-arm problem is to choose a policy, which here is an action sequence, $\pi:=(u_{0},u_{1},\dots)$

\displaystyle\text{so as to minimise}\quad V^{\pi}(x|\nu):=\sum_{t=0}^{\infty}\beta^{t}\left\{(h+\nu)u_{t}+wx_{t}^{\pi}\right\}\quad\text{where $x_{0}=x$.}

(2)

Let $Q(x,\alpha|\nu)$ be the optimal cost-to-go in this problem if the first action must be $\alpha$ and let $\pi^{*}$ be an optimal policy, so that

\displaystyle Q(x,\alpha|\nu):=(h+\nu)\alpha+wx+\beta V^{\pi^{*}}(\phi_{\alpha}(x)|\nu).

For any fixed $x\in\mathbb{R}_{+}$ , the value of $\nu$ for which actions $u_{0}=0$ and $u_{0}=1$ are both optimal is known as the Whittle index $\lambda^{W}(x)$ assuming it exists and is unique. In other words

The Whittle index

\lambda^{W}(x)

is the solution to

Q(x,0|\lambda^{W}(x))=Q(x,1|\lambda^{W}(x)).

(3)

Let us consider a policy which takes action $u_{0}=\alpha$ then acts optimally producing actions $u_{t}^{\alpha*}(x)$ and error variances $x_{t}^{\alpha*}(x)$ . Then (3) gives

\displaystyle\sum_{t=0}^{\infty}\beta^{t}\left\{(h+\lambda^{W}(x))u^{0*}_{t}+wx_{t}^{0*}(x)\right\}=\sum_{t=0}^{\infty}\beta^{t}\left\{(h+\lambda^{W}(x))u^{1*}_{t}+wx_{t}^{1*}(x)\right\}.

Solving this linear equation for the index $\lambda^{W}(x)$ gives

\displaystyle\lambda^{W}(x)=w\frac{\sum_{t=1}^{\infty}\beta^{t}(x_{t}^{0*}(x)-x_{t}^{1*}(x))}{\sum_{t=0}^{\infty}\beta^{t}(u^{1*}_{t}(x)-u^{0*}_{t}(x))}-h.

(4)

Whittle [24] recognised that for his index policy (play the arm with the largest $\lambda^{W}(x)$ ) to make sense, any arm which receives an expensive observation for added cost $\nu$ , must also receive an expensive observation for added cost $\nu^{\prime}<\nu$ . Such problems are said to be indexable. The question resolved by this paper is whether Problem KF1 is indexable. Equivalently, is $\lambda^{W}(x)$ non-decreasing in $x\in\mathbb{R}_{+}$ ?

3 Main Result, Key Concepts and Intuition

We make the following intuitive assumption about threshold (monotone) policies.
A1. For some $x\in\mathbb{R}_{+}$ depending on $\nu\in\mathbb{R}$ , the policy $u_{t}={\bf 1}_{x_{t}\geq x}$ is optimal for problem (2).

Note that under A1, definition (3) means the policy $u_{t}={\bf 1}_{x_{t}>x}$ is also optimal, so we can choose

\displaystyle\left.\begin{aligned} u_{t}^{0*}(x)&:=\begin{cases}0&\text{if $x_{t-1}^{0*}(x)\leq x$}\\ 1&\text{otherwise}\end{cases}&\quad\text{and}\quad x_{t}^{0*}(x)&:=\begin{cases}\phi_{0}(x_{t-1}^{0*}(x))&\text{if $x_{t-1}^{0*}(x)\leq x$}\\ \phi_{1}(x_{t-1}^{0*}(x))&\text{otherwise}\end{cases}\\ u_{t}^{1*}(x)&:=\begin{cases}0&\text{if $x_{t-1}^{1*}(x)<x$}\\ 1&\text{otherwise}\end{cases}&\quad\text{and}\quad x_{t}^{1*}(x)&:=\begin{cases}\phi_{0}(x_{t-1}^{1*}(x))&\text{if $x_{t-1}^{1*}(x)<x$}\\ \phi_{1}(x_{t-1}^{1*}(x))&\text{otherwise}\end{cases}\end{aligned}\quad\right\}

(5)

where $x_{0}^{0*}(x)=x_{0}^{1*}(x)=x$ . We refer to $x_{t}^{0*}(x),x_{t}^{1*}(x)$ as the $x$ -threshold orbits (Figure 1).

We are now ready to state our main result.

Theorem 1. Suppose a threshold policy (A1) is optimal for the single-arm problem (2). Then Problem KF1 is indexable. Specifically, for any $b>a\geq 0$ let

\displaystyle\phi_{0}(x)

\displaystyle:=\frac{x+1}{ax+a+1},

\displaystyle\phi_{1}(x)

\displaystyle:=\frac{x+1}{bx+b+1}

and for any $w\in\mathbb{R}_{+},h\in\mathbb{R}$ and $0<\beta<1$ , let

\displaystyle\lambda^{W}(x):=w\frac{\sum_{t=1}^{\infty}\beta^{t}(x_{t}^{0*}(x)-x_{t}^{1*}(x))}{\sum_{t=0}^{\infty}\beta^{t}(u^{1*}_{t}(x)-u^{0*}_{t}(x))}-h

(6)

in which action sequences $u_{t}^{0*}(x),u_{t}^{1*}(x)$ and error variance sequences $x_{t}^{0*}(x),x_{t}^{1*}(x)$ are given in terms of $\phi_{0},\phi_{1}$ by (5). Then $\lambda^{W}(x)$ is a continuous and non-decreasing function of $x\in\mathbb{R}_{+}$ .

Refer to caption — Figure 1: Orbit $x^{0*}_{t}(x)$ traces the path $ABCDE\dots$ for the word $01w=01101$ . Orbit $x^{1*}_{t}(x)$ traces the path $FGHIJ\dots$ for the word $10w=10101$ . Word $w=101$ is a palindrome.

We are now ready to describe the key concepts underlying this result.

Words. In this paper, a word $w$ is a string on $\{0,1\}^{*}$ with $k^{\rm th}$ letter $w_{k}$ and $w_{i:j}:=w_{i}w_{i+1}\dots w_{j}$ . The empty word is $\epsilon$ , the concatenation of words $u,v$ is $uv$ , the word that is the $n$ -fold repetition of $w$ is $w^{n}$ , the infinite repetition of $w$ is $w^{\omega}$ and $\tilde{w}$ is the reverse of $w$ , so $w=\tilde{w}$ means $w$ is a palindrome. The length of $w$ is ${\left|w\right|}$ and ${\left|w\right|}_{u}$ is the number of times that word $u$ appears in $w$ , overlaps included.

Christoffel, Sturmian and Mechanical Words. It turns out that the action sequences in (5) are given by such words, so the following definitions are central to this paper.

The Christoffel tree (Figure 2) is an infinite complete binary tree [5] in which each node is labelled with a pair $(u,v)$ of words. The root is $(0,1)$ and the children of $(u,v)$ are $(u,uv)$ and $(uv,v)$ . The Christoffel words are the words $0,1$ and the concatenations $uv$ for all $(u,v)$ in that tree. The fractions ${\left|uv\right|}_{1}/{\left|uv\right|}_{0}$ form the Stern-Brocot tree [9] which contains each positive rational number exactly once. Also, infinite paths in the Stern-Brocot tree converge to the positive irrational numbers. Analogously, Sturmian words could be thought of as infinitely-long Christoffel words.

Alternatively, among many known characterisations, the Christoffel words can be defined as the words $0,1$ and the words $0w1$ where $a:={\left|0w1\right|}_{1}/{\left|0w1\right|}$ and

\displaystyle(01w)_{n}:=\lfloor(n+1)a\rfloor-\lfloor na\rfloor

for any relatively prime natural numbers ${\left|0w1\right|}_{0}$ and ${\left|0w1\right|}_{1}$ and for $n=1,2,\dots,{\left|0w1\right|}$ . The Sturmian words are then the infinite words $0w_{1}w_{2}\cdots$ where, for $n=1,2,\dots$ and $a\in(0,1)\backslash\mathbb{Q}$ ,

\displaystyle(01w_{1}w_{2}\cdots)_{n}:=\lfloor(n+1)a\rfloor-\lfloor na\rfloor.

We use the notation $0w1$ for Sturmian words although they are infinite.

The set of mechanical words is the union of the Christoffel and Sturmian words [13]. (Note that the mechanical words are sometimes defined in terms of infinite repetitions of the Christoffel words.)

Majorisation. As in [14], let $x,y\in\mathbb{R}^{m}$ and let $x_{(i)}$ and $y_{(i)}$ be their elements sorted in ascending order. We say $x$ is weakly supermajorised by $y$ and write $x\prec^{w}y$ if

\displaystyle\sum_{k=1}^{j}x_{(k)}\geq\sum_{k=1}^{j}y_{(k)}\qquad\text{for all $j=1,\dots,m$.}

If this is an equality for $j=m$ we say $x$ is majorised by $y$ and write $x\prec y$ . It turns out that

	$\displaystyle x\prec y\qquad$	$\displaystyle\Leftrightarrow\qquad\sum_{k=1}^{j}x_{[k]}\leq\sum_{k=1}^{j}y_{[k]}\quad\text{for $j=1,\dots,m-1$ with equality for $j=m$}$
where $x_{[k]},y_{[k]}$ are the sequences sorted in descending order. For $x,y\in\mathbb{R}^{m}$ we have [14]
	$\displaystyle x\prec y\qquad$	$\displaystyle\Leftrightarrow\qquad\sum_{i=1}^{m}f(x_{i})\leq\sum_{i=1}^{m}f(y_{i})\quad\text{for all convex functions $f:\mathbb{R}\rightarrow\mathbb{R}$.}$

More generally, a real-valued function $\phi$ defined on a subset $\mathcal{A}$ of $\mathbb{R}^{m}$ is said to be Schur-convex on $\mathcal{A}$ if $x\prec y$ implies that $\phi(x)\leq\phi(y)$ .

Möbius Transformations. Let $\mu_{A}(x)$ denote the Möbius transformation $\mu_{A}(x):=\frac{A_{11}x+A_{12}}{A_{21}x+A_{22}}$ where $A\in\mathbb{R}^{2\times 2}$ . Möbius transformations such as $\phi_{0}(\cdot),\phi_{1}(\cdot)$ are closed under composition, so for any word $w$ we define $\phi_{w}(x):=\phi_{w_{{\left|w\right|}}}\circ\dots\circ\phi_{w_{2}}\circ\phi_{w_{1}}(x)$ and $\phi_{\epsilon}(x):=x.$

Intuition. Here is the intuition behind our main result.

For any $x\in\mathbb{R}_{+}$ , the orbits in (5) correspond to a particular mechanical word $0,1$ or $0w1$ depending on the value of $x$ (Figure 1). Specifically, for any word $u$ , let $y_{u}$ be the fixed point of the mapping $\phi_{u}$ on $\mathbb{R}_{+}$ so that $\phi_{u}(y_{u})=y_{u}$ and $y_{u}\in\mathbb{R}_{+}$ . Then the word corresponding to $x$ is 1 for $0\leq x\leq y_{1}$ , $0w1$ for $x\in[y_{01w},y_{10w}]$ and 0 for $y_{0}\leq x<\infty$ . In passing we note that these fixed points are sorted in ascending order by the ratio $\rho:={\left|01w\right|}_{0}/{\left|01w\right|}_{1}$ of counts of 0s to counts of 1s, as illustrated by Figure 3. Interestingly, it turns out that ratio $\rho$ is a piecewise-constant yet continuous function of $x$ , reminiscent of the Cantor function.

Also, composition of Möbius transformations is homeomorphic to matrix multiplication so that

\displaystyle\mu_{A}\circ\mu_{B}(x)=\mu_{AB}(x)\qquad\text{for any $A,B\in\mathbb{R}^{2\times 2}.$}

Thus, the index (6) can be written in terms of the orbits of a linear system (11) given by $0,1$ or $0w1.$ Further, if $A\in\mathbb{R}^{2\times 2}$ and $\det(A)=1$ then the gradient of the corresponding Möbius transformation is the convex function

\displaystyle\frac{d\mu_{A}(x)}{dx}=\frac{1}{(A_{21}x+A_{22})^{2}}.

So the gradient of the index is the difference of the sums of a convex function of the linear-system orbits. However, such sums are Schur-convex functions and it follows that the index is increasing because one orbit weakly supermajorises the other, as we now show for the case $0w1$ (noting that the proof is easier for words $0,1$ ). As $0w1$ is a mechanical word, $w$ is a palindrome. Further, if $w$ is a palindrome, it turns out that the difference between the linear-system orbits increases with $x$ . So, we might define the majorisation point for $w$ as the $x$ for which one orbit majorises the other. Quite remarkably, if $w$ is a palindrome then the majorisation point is $\phi_{w}(0)$ (Proposition 7). Indeed the black circles and blue dots of Figure 3 coincide. Finally, $\phi_{w}(0)$ is less than or equal to $y_{01w}$ which is the least $x$ for which the orbits correspond to the word $0w1$ . Indeed, the blue dots of Figure 3 are below the corresponding black dots. Thus one orbit does indeed supermajorise the other.

4 Proof of Main Result

4.1 Mechanical Words

The Möbius transformations of (1) satisfy the following assumption for $\mathcal{I}:=\mathbb{R}_{+}$ . We prove that the fixed point $y_{w}$ of word $w$ (the solution to $\phi_{w}(x)=x$ on $\mathcal{I}$ ) is unique in the supplementary material.

Assumption A2. Functions $\phi_{0}:\mathcal{I}\rightarrow\mathcal{I},\phi_{1}:\mathcal{I}\rightarrow\mathcal{I}$ , where $\mathcal{I}$ is an interval of $\mathbb{R}$ , are increasing and non-expansive, so for all $x,y\in\mathcal{I}:x<y$ and for $k\in\{0,1\}$ we have

\displaystyle\underbrace{\phi_{k}(x)<\phi_{k}(y)}_{\text{increasing}}\qquad\qquad\text{and}\qquad\qquad\underbrace{\phi_{k}(y)-\phi_{k}(x)<y-x}_{\text{non-expansive}}.

Furthermore, the fixed points $y_{0},y_{1}$ of $\phi_{0},\phi_{1}$ on $\mathcal{I}$ satisfy $y_{1}<y_{0}$ .

Hence the following two propositions (supplementary material) apply to $\phi_{0},\phi_{1}$ of (1) on $\mathcal{I}=\mathbb{R}_{+}$ .

Proposition 1.

Suppose A2 holds, $x\in\mathcal{I}$ and $w$ is a non-empty word. Then

\displaystyle x<\phi_{w}(x)\ \Leftrightarrow\ \phi_{w}(x)<y_{w}\ \Leftrightarrow\ x<y_{w}

and

\displaystyle x>\phi_{w}(x)\ \Leftrightarrow\ \phi_{w}(x)>y_{w}\ \Leftrightarrow\ x>y_{w}.

For a given $x$ , in the notation of (5), we call the shortest word $u$ such that $(u^{1*}_{1},u^{1*}_{2},\dots)=u^{\omega}$ the $x$ -threshold word. Proposition 2 generalises a recent result about $x$ -threshold words in a setting where $\phi_{0},\phi_{1}$ are linear [18].

Proposition 2.

Suppose A2 holds and $0w1$ is a mechanical word. Then

\displaystyle\text{$0w1$ is the $x$-threshold word}\ \Leftrightarrow\ x\in[y_{01w},y_{10w}].

Also, if $x_{0},x_{1}\in\mathcal{I}$ with $x_{0}\geq y_{0}$ and $x_{1}\leq y_{1}$ then the $x_{0}$ - and $x_{1}$ -threshold words are $0$ and $1$ .

We also use the following very interesting fact (Proposition 4.2 on p.28 of [5]).

Proposition 3.

Suppose $0w1$ is a mechanical word. Then $w$ is a palindrome.

4.2 Properties of the Linear-System Orbits $M(w)$ and Prefix Sums $S(w)$

Definition. Assume that $a,b\in\mathbb{R}_{+}$ and $a<b$ . Consider the matrices

\displaystyle F:=\begin{pmatrix}1&1\\ a&1+a\end{pmatrix},\qquad G:=\begin{pmatrix}1&1\\ b&1+b\end{pmatrix}\quad\text{and}\quad K:=\begin{pmatrix}-1&-1\\ 0&1\end{pmatrix}

so that the Möbius transformations $\mu_{F},\mu_{G}$ are the functions $\phi_{0},\phi_{1}$ of (1) and $GF-FG=(b-a)K$ . Given any word $w\in\{0,1\}^{*}$ , we define the matrix product $M(w)$

\displaystyle M(w):=M(w_{{\left|w\right|}})\cdots M(w_{1}),\quad\text{where $M(\epsilon):=I,M(0):=F$ and $M(1):=G$}

where $I\in\mathbb{R}^{2\times 2}$ is the identity and the prefix sum $S(w)$ as the matrix polynomial

\displaystyle S(w):=\sum_{k=1}^{{\left|w\right|}}M(w_{1:k}),\qquad\text{where $S(\epsilon):=0$ (the all-zero matrix).}

(7)

For any $A\in\mathbb{R}^{2\times 2}$ , let $\text{tr}(A)$ be the trace of $A$ , let $A_{ij}=[A]_{ij}$ be the entries of $A$ and let $A\geq 0$ indicate that all entries of $A$ are non-negative.

Remark. Clearly, $\det(F)=\det(G)=1$ so that $\det(M(w))=1$ for any word $w$ . Also, $S(w)$ corresponds to the partial sums of the linear-system orbits, as hinted in the previous section.

The following proposition captures the role of palindromes (proof in the supplementary material).

Proposition 4.

Suppose $w$ is a word, $p$ is a palindrome and $n\in\mathbb{Z}_{+}$ . Then

1.

$M(p)=\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix}$ for some $f,h\in\mathbb{R}$ ,
2.

$\text{tr}(M(10p))=\text{tr}(M(01p))$ ,
3.

If $u\in\{p(10p)^{n},(10p)^{n}10\}$ then $M(u)-M(\tilde{u})=\lambda K$ for some $\lambda\in\mathbb{R}_{-}$ ,
4.

If $w$ is a prefix of $p$ then $[M(p(10p)^{n}10w)]_{22}\leq[M(p(01p)^{n}01w)]_{22}$ ,
5.

$[M((10p)^{n}10w)]_{21}\geq[M((01p)^{n}01w)]_{21}$ ,
6.

$[M((10p)^{n}1)]_{21}\geq[M((01p)^{n}0)]_{21}$ .

We now demonstrate a surprisingly simple relation between $S(w)$ and $M(w)$ .

Proposition 5.

Suppose $w$ is a palindrome. Then

\displaystyle S_{21}(w)=M_{22}(w)-1\qquad\text{and}\qquad S_{22}(w)=M_{12}(w)+S_{21}(w).

(8)

Furthermore, if $\Delta_{k}:=[S(10w)M(w(10w)^{k})-S(01w)M(w(01w)^{k})]_{22}$ then

\displaystyle\Delta_{k}=0\qquad\text{for all $k\in\mathbb{Z}_{+}$.}

(9)

Proof.

Let us write $M:=M(w),S:=S(w)$ . We prove (8) by induction on ${\left|w\right|}$ . In the base case $w\in\{\epsilon,0,1\}$ . For $w=\epsilon$ , $M_{22}-1=0=S_{21},M_{12}+S_{21}=0=S_{22}.$ For $w\in\{0,1\}$ , $M_{22}-1=c=S_{21},M_{12}+S_{21}=1+c=S_{22}$ for some $c\in\{a,b\}$ . For the inductive step, in accordance with Claim 1 of Proposition 19, assume $w\in\{0v0,1v1\}$ for some word $v$ satisfying

\displaystyle M(v)

\displaystyle=\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix},

\displaystyle S(v)

\displaystyle=\begin{pmatrix}c&d\\ h-1&f+h-1\end{pmatrix}\quad\text{for some $c,d,f,h\in\mathbb{R}$.}

For $w=1v1$ , $M:=M(1v1)=GM(v)G$ and $S:=S(1v1)=GM(v)G+S(v)G+G$ . Calculating the corresponding matrix products and sums gives

	$\displaystyle S_{21}$	$\displaystyle=(bh+h+bf-1)(bh+2h+bf+f+1)(h+f)^{-1}=M_{22}-1$
	$\displaystyle S_{22}-S_{21}$	$\displaystyle=bh+2h+bf+f=M_{12}$

as claimed. For $w=0u0$ the claim also holds as $F=\left.G\right|_{b=a}$ . This completes the proof of (8).

Furthermore Part. Let $A:=S(w)FG+FG+G$ and $B:=S(w)GF+GF+F$ . Then

\displaystyle\Delta_{k}=[(A(M(w)FG)^{k}-B(M(w)GF)^{k})M(w)]_{22}

(10)

by definition of $S(\cdot)$ . By Claim 1 of Proposition 19 and (8) we know that

\displaystyle M(w)

\displaystyle=\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix},

\displaystyle S(w)

\displaystyle=\begin{pmatrix}c&d\\ h-1&f+h-1\end{pmatrix}\quad\text{for some $c,d,f,h\in\mathbb{R}$.}

Substituting these expressions and the definitions of $F,G$ into the definitions of $A,B$ and then into (10) for $k\in\{0,1\}$ directly gives $\Delta_{0}=\Delta_{1}=0$ (although this calculation is long).

Now consider the case $k\geq 2$ . Claim 2 of Proposition 19 says $\text{tr}(M(10w))=\text{tr}(M(01w))$ and clearly $\det(M(10w))=\det(M(01w))=1$ . Thus we can diagonalise as

\displaystyle M(w)FG

\displaystyle=:UDU^{-1},

\displaystyle M(w)GF

\displaystyle=:VDV^{-1},

\displaystyle D

\displaystyle:=\text{diag}(\lambda,1/\lambda)\quad\text{for some $\lambda\geq 1$}

so that $\Delta_{k}=[AUD^{k}U^{-1}M(w)-e^{T}BVD^{k}V^{-1}M(w)]_{22}=:\gamma_{1}\lambda^{k}+\gamma_{2}\lambda^{-k}.$ So, if $\lambda=1$ then $\Delta_{k}=\gamma_{1}+\gamma_{2}=\Delta_{0}$ and we already showed that $\Delta_{0}=0$ . Otherwise $\lambda\neq 1$ , so $\Delta_{0}=\Delta_{1}=0$ implies $\gamma_{1}+\gamma_{2}=\gamma_{1}\lambda+\gamma_{2}\lambda^{-1}=0$ which gives $\gamma_{1}=\gamma_{2}=0$ . Thus for any $k\in\mathbb{Z}_{+}$ we have $\Delta_{k}=\gamma_{1}\lambda^{k}+\gamma_{2}\lambda^{-k}=0$ . ∎

4.3 Majorisation

The following is a straightforward consequence of results in [14] proved in the supplementary material. We emphasize that the notation $\prec^{w}$ has nothing to do with the notion of $w$ as a word.

Proposition 6.

Suppose $x,y\in\mathbb{R}_{+}^{m}$ and $f:\mathbb{R}\rightarrow\mathbb{R}$ is a symmetric function that is convex and decreasing on $\mathbb{R}_{+}$ . Then $\text{$x\prec^{w}y$ and $\beta\in[0,1]$}\quad\Rightarrow\quad\sum_{i=1}^{m}\beta^{i}f(x_{(i)})\geq\sum_{i=1}^{m}\beta^{i}f(y_{(i)})$ .

For any $x\in\mathbb{R}$ and any fixed word $w$ , define the sequences for $n\in\mathbb{Z}_{+}$ and $k=1,\dots,m$

\displaystyle\left.\begin{aligned} x_{nm+k}(x)&:=[M((10w)^{n}(10w)_{1:k})v(x)]_{2},&\sigma_{x}^{(n)}:=(x_{nm+1}(x),\dots,x_{nm+m}(x))\\ y_{nm+k}(x)&:=[M((01w)^{n}(01w)_{1:k})v(x)]_{2},&\sigma_{y}^{(n)}:=(y_{nm+1}(x),\dots,y_{nm+m}(x))\end{aligned}\right\}

(11)

where $m:={\left|10w\right|}$ and $v(x):=(x,1)^{T}.$

Proposition 7.

Suppose $w$ is a palindrome and $x\geq\phi_{w}(0)$ . Then $\sigma_{x}^{(n)}$ and $\sigma_{y}^{(n)}$ are ascending sequences on $\mathbb{R}_{+}$ and $\sigma_{x}^{(n)}\prec^{w}\sigma_{y}^{(n)}$ for any $n\in\mathbb{Z}_{+}$ .

Proof.

Clearly $\phi_{w}(0)\geq 0$ so $x\geq 0$ and hence $v(x)\geq 0$ . So for any word $u$ and letter $c\in\{0,1\}$ we have $M(uc)v(x)=M(c)M(u)v(x)\geq M(u)v(x)\geq 0$ as $M(c)\geq I$ . Thus $x_{k+1}(x)\geq x_{k}(x)\geq 0$ and $y_{k+1}(x)\geq y_{k}(x)\geq 0$ . In conclusion, $\sigma_{x}^{(n)}$ and $\sigma_{y}^{(n)}$ are ascending sequences on $\mathbb{R}_{+}$ .

Now $\phi_{w}(0)=\frac{[M(w)]_{12}}{[M(w)]_{22}}$ . Thus $[Av(\phi_{w}(0))]_{2}:=\frac{[AM(w)]_{22}}{[M(w)]_{22}}$ for any $A\in\mathbb{R}^{2\times 2}$ . So

	$\displaystyle x_{nm+k}(\phi_{w}(0))-y_{nm+k}(\phi_{w}(0))$
	$\displaystyle\quad=\frac{1}{[M(w)]_{22}}\left[(M((10w)^{n}(10w)_{1:k})-M((01w)^{n}(01w)_{1:k}))M(w)\right]_{22}\leq 0$

for $k=2,\dots,m$ by Claim 4 of Proposition 19. So all but the first term of the sum $T_{m}(\phi_{w}(0))$ is non-positive where

\displaystyle T_{j}(x):=\sum_{k=1}^{j}(x_{nm+k}(x)-y_{nm+k}(x)).

Thus $T_{1}(\phi_{w}(0))\geq T_{2}(\phi_{w}(0))\geq\dots T_{m}(\phi_{w}(0))$ . But

	$\displaystyle T_{m}(\phi_{w}(0))$	$\displaystyle=\frac{1}{[M(w)]_{22}}\sum_{k=1}^{m}\left[(M((10w)^{n}(10w)_{1:k})-M((01w)^{n}(01w)_{1:k}))M(w)\right]_{22}$
		$\displaystyle=\frac{1}{[M(w)]_{22}}\left[S(10w)M(w(10w)^{n})-S(01w)M(w(01w)^{n})\right]_{22}=0$

where the last step follows from (9). So $T_{j}(\phi_{w}(0))\geq 0$ for $j=1,\dots,m$ . Yet Claims 5 and 6 of Proposition 19 give $\frac{d}{dx}T_{j}(x)=\sum_{k=1}^{j}[M((10w)^{n}(10w)_{1:k})-M((01w)^{n}(01w)_{1:k})]_{21}\geq 0.$ So for $x\geq\phi_{w}(0)$ we have $T_{j}(x)\geq 0$ for $j=1,\dots,m$ which means that $\sigma_{x}^{(n)}\prec^{w}\sigma_{y}^{(n)}$ . ∎

4.4 Indexability

Theorem 1.

The index $\lambda^{W}(x)$ of (6) is continuous and non-decreasing for $x\in\mathbb{R}_{+}$ .

Proof.

As weight $w$ is non-negative and cost $h$ is a constant we only need to prove the result for $\lambda(x):=\left.\lambda^{W}(x)\right|_{w=1,h=0}$ and we can use $w$ to denote a word. By Proposition 2, $x\in[y_{01w},y_{10w}]$ for some mechanical word $0w1$ . (Cases $x\notin(y_{1},y_{0})$ are clarified in the supplementary material.)

Let us show that the hypotheses of Proposition 7 are satisfied by $w$ and $x$ . Firstly, $w$ is a palindrome by Proposition 3. Secondly, $\phi_{w01}(0)\geq 0$ and as $\phi_{w}(\cdot)$ is monotonically increasing, it follows that $\phi_{w}\circ\phi_{w01}(0)\geq\phi_{w}(0)$ . Equivalently, $\phi_{01w}\circ\phi_{w}(0)\geq\phi_{w}(0)$ so that $\phi_{w}(0)\leq y_{01w}$ by Proposition 1. Hence $x\geq y_{01w}\geq\phi_{w}(0)$ .

Thus Proposition 7 applies, showing that the sequences $\sigma_{x}^{(n)}$ and $\sigma_{y}^{(n)}$ , with elements $x_{nm+k}(x)$ and $y_{nm+k}(x)$ as defined in (11), are non-decreasing sequences on $\mathbb{R}_{+}$ with $\sigma_{x}^{(n)}\prec^{w}\sigma_{y}^{(n)}$ . Also, $1/x^{2}$ is a symmetric function that is convex and decreasing on $\mathbb{R}_{+}$ . Therefore Proposition 6 applies giving

\displaystyle\sum_{k=1}^{m}\left(\frac{\beta^{nm+k-1}}{(x_{nm+k}(x))^{2}}-\frac{\beta^{nm+k-1}}{(y_{nm+k}(x))^{2}}\right)\geq 0\qquad\text{for any $n\in\mathbb{Z}_{+}$ where $m:={\left|01w\right|}$.}

(12)

Also Proposition 2 shows that the $x$ -threshold orbits are $(\phi_{u_{1}}(x),\dots,\phi_{u_{1:k}}(x),\dots)$ and $(\phi_{l_{1}}(x),\dots,\phi_{l_{1:k}}(x),\dots)$ where $u:=(01w)^{\omega}$ and $l:=(10w)^{\omega}$ . So the denominator of (6) is

\displaystyle\sum_{k=0}^{\infty}\beta^{k}({\bf 1}_{l_{k+1}=1}-{\bf 1}_{u_{k+1}=1})=\sum_{k=0}^{\infty}\beta^{mk}(1-\beta)\Rightarrow\lambda(x)=\frac{1-\beta^{m}}{1-\beta}\sum_{k=1}^{\infty}\beta^{k-1}(\phi_{u_{1:k}}(x)-\phi_{l_{1:k}}(x)).

Note that $\frac{d}{dx}\frac{ex+f}{gx+h}=\frac{1}{(gx+h)^{2}}$ for any $eh-fg=1$ . Then (12) gives

\displaystyle\frac{d\lambda(x)}{dx}=\frac{1-\beta^{m}}{1-\beta}\sum_{n=0}^{\infty}\sum_{k=1}^{m}\left(\frac{\beta^{nm+k-1}}{(x_{nm+k}(x))^{2}}-\frac{\beta^{nm+k-1}}{(y_{nm+k}(x))^{2}}\right)\geq 0.

But $\lambda(x)$ is continuous for $x\in\mathbb{R}_{+}$ (as shown in the supplementary material). Therefore we conclude that $\lambda(x)$ is non-decreasing for $x\in\mathbb{R}_{+}$ . ∎

5 Further Work

One might attempt to prove that assumption A1 holds using general results about monotone optimal policies for two-action MDPs based on submodularity [2] or multimodularity [1]. However, we find counter-examples to the required submodularity condition. Rather, we are optimistic that the ideas of this paper themselves offer an alternative approach to proving A1. It would then be natural to extend our results to settings where the underlying state evolves as $Z_{t+1}\mid\mathcal{H}_{t}\sim\mathcal{N}(mZ_{t},1)$ for some multiplier $m\neq 1$ and to cost functions other than the variance. Finally, the question of the indexability of the discrete-time Kalman filter in multiple dimensions remains open.

References

[1] E. Altman, B. Gaujal, and A. Hordijk. Multimodularity, convexity, and optimization properties. Mathematics of Operations Research, 25(2):324–347, 2000.
[2] E. Altman and S. Stidham Jr. Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Systems, 21(3-4):267–291, 1995.
[3] M. Araya, O. Buffet, V. Thomas, and F. Charpillet. A POMDP extension with belief-dependent rewards. In Neural Information Processing Systems, pages 64–72, 2010.
[4] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 671–680, 2014.
[5] J. Berstel, A. Lauve, C. Reutenauer, and F. Saliola. Combinatorics on Words: Christoffel Words and Repetitions in Words. CRM Monograph Series, 2008.
[6] S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Foundation and Trends in Machine Learning, Vol. 5. NOW, 2012.
[7] Y. Chen, H. Shioi, C. Montesinos, L. P. Koh, S. Wich, and A. Krause. Active detection via adaptive submodularity. In Proceedings of The 31st International Conference on Machine Learning, pages 55–63, 2014.
[8] J. Gittins, K. Glazebrook, and R. Weber. Multi-armed bandit allocation indices. John Wiley & Sons, 2011.
[9] R. Graham, D. Knuth, and O. Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley, 1994.
[10] S. Guha, K. Munagala, and P. Shi. Approximation algorithms for restless bandit problems. Journal of the ACM, 58(1):3, 2010.
[11] B. La Scala and B. Moran. Optimal target tracking with restless bandits. Digital Signal Processing, 16(5):479–487, 2006.
[12] J. Le Ny, E. Feron, and M. Dahleh. Scheduling continuous-time Kalman filters. IEEE Trans. Automatic Control, 56(6):1381–1394, 2011.
[13] M. Lothaire. Algebraic combinatorics on words. Cambridge University Press, 2002.
[14] A. Marshall, I. Olkin, and B. Arnold. Inequalities: Theory of majorization and its applications. Springer Science & Business Media, 2010.
[15] L. Meier, J. Peschon, and R. Dressler. Optimal control of measurement subsystems. IEEE Trans. Automatic Control, 12(5):528–536, 1967.
[16] J. Niño-Mora and S. Villar. Multitarget tracking via restless bandit marginal productivity indices and Kalman filter in discrete time. In Proceedings of the 48th IEEE Conference on Decision and Control, pages 2905–2910, 2009.
[17] R. Ortner, D. Ryabko, P. Auer, and R. Munos. Regret bounds for restless Markov bandits. In Algorithmic Learning Theory, pages 214–228. Springer, 2012.
[18] B. Rajpathak, H. Pillai, and S. Bandyopadhyay. Analysis of stable periodic orbits in the one dimensional linear piecewise-smooth discontinuous map. Chaos, 22(3):033126, 2012.
[19] T. Thiele. Sur la compensation de quelques erreurs quasi-systématiques par la méthode des moindres carrés. CA Reitzel, 1880.
[20] I. Verloop. Asymptotic optimal control of multi-clss restless bandits. CNRS Technical Report, hal-00743781, 2014.
[21] S. Villar. Restless bandit index policies for dynamic sensor scheduling optimization. PhD thesis, Statistics Department, Universidad Carlos III de Madrid, 2012.
[22] E. Vul, G. Alvarez, J. B. Tenenbaum, and M. J. Black. Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model. In Neural Information Processing Systems, pages 1955–1963, 2009.
[23] R. R. Weber and G. Weiss. On an index policy for restless bandits. Journal of Applied Probability, pages 637–648, 1990.
[24] P. Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, pages 287–298, 1988.

6 Supplementary Material: Introduction

The results used but not proved in the main paper are given here as:

•

Proposition 9 which was used to show that $\phi_{w}(0)\leq x$ ,
•

Proposition 16 for the range of $x$ giving a specific mechanical word,
•

Proposition 17 showing the index is continuous for $x\in\mathbb{R}_{+},$
•

Proposition 19 showing the properties of $M(p)$ when $p$ is a palindrome.
•

and Proposition 20 for weak supermajorisation with $\beta\neq 1$ .

A clarification of the extreme cases of Theorem 1 of the main paper is presented in the final section.

7 From $x$ -Threshold Policies to Mechanical Words

Some concepts relating to mechanical words appeared as early as 1771 in Jean Bernoulli’s study of continued fractions (Berstel et al, 2008). The term “mechanical sequences” appears in the work of Morse and Hedlund (Am. J. Math., Vol 62, No. 1, 1940, p. 1-42) who had just introduced the term “symbolic dynamics”. Morse and Hedlund studied the concept from the perspective of sequences of the form $\lfloor c+k\beta\rfloor$ for $c,\beta\in\mathbb{R}$ and $k\in\mathbb{Z}$ . They also studied the concept from the perspective of differential equations, motivating the term “Sturmian sequences.” Since that time there has been tremendous progress in the study of such sequences from the perspective of Combinatorics on Words (Lothaire, 2001). However, the recent (and highly-approachable) paper of Rajpathak, Pillai and Bandyopadhyay (Chaos, Vol. 22, 2012) on the piecewise-linear map-with-a-gap discovers such sequences without recognising them as mechanical sequences. Proposition 16 of this section is a substantial generalisation of that result and we could not find this proposition explicitly stated in the literature. Our result is not surprising if one has the intuition that there is a topological conjugacy between the maps of this section and the piecewise linear map-with-a-gap. However, it might be difficult to explicitly identify the appropriate topological conjugacy and thereby prove our result for all cases considered here.

7.1 Definitions

Let $\pi$ denote a word consisting of a string of 0s and 1s in which the $k^{th}$ letter is $\pi_{k}$ and letters $i,i+1,\dots,j$ are $\pi_{i:j}$ . Let ${\left|\pi\right|}$ be the length of $\pi$ and ${\left|\pi\right|}_{w}$ for a word $w$ be the number of times that word $w$ appears in $\pi$ . Let $\epsilon$ denote the empty word and $\pi^{\omega}$ denote the infinite word constructed by repeatedly concatenating $\pi$ .

Consider two functions $\phi_{0}:\mathcal{I}\rightarrow\mathcal{I}$ and $\phi_{1}:\mathcal{I}\rightarrow\mathcal{I}$ where $\mathcal{I}$ is an interval of $\mathbb{R}.$ We define the transformation $\phi_{\pi}:\mathcal{I}\rightarrow\mathcal{I}$ for any word $\pi$ by the composition

\displaystyle\phi_{\pi}(x):=\phi_{\pi_{{\left|\pi\right|}}}\circ\cdots\circ\phi_{\pi_{2}}\circ\phi_{\pi_{1}}(x).

Let $y_{\pi}\in\mathcal{I}$ be the fixed point of $\phi_{\pi}$ , so $\phi_{\pi}(y_{\pi})=y_{\pi}$ , assuming a unique fixed point on $\mathcal{I}$ exists.

Given $x\in\mathcal{I}$ , we call the sequence $(x_{k}:k\geq 1)$ the $x$ -threshold orbit for $\phi_{0},\phi_{1}$ if

\displaystyle x_{1}

\displaystyle=\phi_{1}(x),

\displaystyle x_{k+1}

\displaystyle=\begin{cases}\phi_{1}(x_{k})&\text{if $x_{k}\geq x$}\\ \phi_{0}(x_{k})&\text{if $x_{k}<x$}\end{cases}

\displaystyle\text{for $k\geq 1$}.

We call $\pi$ the $x$ -threshold word for $\phi_{0},\phi_{1}$ if it is the shortest word such that $x_{k+1}=\phi_{(\pi^{\omega})_{k}}(x_{k})$ for all $k\geq 1$ . We shall just write $x$ -threshold orbit and $x$ -threshold word where $\phi_{0},\phi_{1}$ are obvious from the context.

For $p\geq 1$ , let $L_{p},R_{p}$ be the morphisms (substitutions)

\displaystyle L_{p}:\begin{cases}0\rightarrow 0^{p+1}1\\ 1\rightarrow 0^{p}1\end{cases}

\displaystyle R_{p}:\begin{cases}0\rightarrow 01^{p}\\ 1\rightarrow 01^{p+1}\end{cases}.

We say $\pi$ is a valid word if $\pi\in\{0,1\}$ or $\pi\in\{L_{p}(w),R_{p}(w):p\geq 1\}$ for some valid word $w$ .

Remark. The morphisms $L_{p},R_{p}$ generate the Christoffel tree so valid words are mechanical words. To see this, note that the Christoffel tree is generated by the following morphisms (Berstel et al, 2008, p. 37)

\displaystyle G

\displaystyle:\begin{cases}0\rightarrow 0\\ 1\rightarrow 01\end{cases}

\displaystyle{\tilde{D}}

\displaystyle:\begin{cases}0\rightarrow 01\\ 1\rightarrow 1\end{cases}.

We may translate (from English to French) as $L_{p}=G^{p}\circ\tilde{D}$ and $R_{p}=\tilde{D}^{p}\circ G$ so any composition of $L_{p}$ and $R_{p}$ can be written as a composition of $G$ and $\tilde{D}$ . Likewise, any composition of $G$ and $\tilde{D}$ can be written as a composition of $L_{p}$ and $R_{p}$ . Specifically if $p_{k},q_{k},p_{k+1}\geq 2$ then

	$\displaystyle\cdots\circ G^{p_{k}-1}\circ{\tilde{D}}^{q_{k}}\circ G^{p_{k+1}}\circ{\tilde{D}}\circ\cdots$
	$\displaystyle\quad=\cdots\circ(G^{p_{k}-1}\circ{\tilde{D}})\circ({\tilde{D}}^{q_{k}-1}\circ G)\circ(G^{p_{k+1}-1}\circ{\tilde{D}})\circ\cdots$
	$\displaystyle\quad=\cdots\circ L_{p_{k}-1}\circ R_{q_{k}-1}\circ L_{p_{k+1}-1}\circ\cdots$

whereas if $q_{k}=1$ we have

	$\displaystyle\cdots\circ G^{p_{k}-1}\circ{\tilde{D}}\circ G^{p_{k+1}}\circ{\tilde{D}}\circ\cdots$
	$\displaystyle\quad=\cdots\circ(G^{p_{k}-1}\circ{\tilde{D}})\circ(G^{p_{k+1}}\circ{\tilde{D}})\circ\cdots$
	$\displaystyle\quad=\cdots\circ L_{p_{k}-1}\circ L_{p_{k+1}}\circ\cdots.$

A symmetric argument holds if $p_{k}=1$ or $p_{k+1}=1$ .

7.2 Fixed Points

Throughout, we make the following assumption about $\phi_{0},\phi_{1}$ . The existence of fixed points $y_{0},y_{1}$ is addressed immediately thereafter.

Assumption A2. Functions $\phi_{0}:\mathcal{I}\rightarrow\mathcal{I},\phi_{1}:\mathcal{I}\rightarrow\mathcal{I}$ , where $\mathcal{I}$ is an interval of $\mathbb{R}$ , are increasing and non-expansive. Equivalently, for all $x,y\in\mathcal{I}:x<y$ and for $k\in\{0,1\}$ we have

\displaystyle\underbrace{\phi_{k}(x)<\phi_{k}(y)}_{\text{increasing}}\qquad\qquad\text{and}\qquad\qquad\underbrace{\phi_{k}(y)-\phi_{k}(x)<y-x}_{\text{non-expansive}}.

Furthermore, the fixed points $y_{0},y_{1}$ of $\phi_{0},\phi_{1}$ satisfy $y_{1}<y_{0}$ .

Proposition 8.

Suppose A2 holds, that $x\in\mathcal{I}$ and that $w$ is any non-empty word. Then $\phi_{w}(x)$ is increasing and non-expansive. Further, the fixed point $y_{w}$ exists and is unique.

Proof.

First we show that $\phi_{w}(x)$ is increasing, by induction. In the base case, ${\left|w\right|}=1$ and the claim follows from A2. For the inductive step assume $\phi_{u}(x)$ is increasing, where $w=au$ for some $a\in\{0,1\}$ and word $u$ . Then for any $x,y\in\mathcal{I}:x<y$ ,

$\displaystyle\phi_{w}(y)$	$\displaystyle=\phi_{u}(\phi_{a}(y))$
	$\displaystyle>\phi_{u}(\phi_{a}(x))$	as $\phi_{a}(y)>\phi_{a}(x)$ and $\phi_{u}$ is increasing
	$\displaystyle=\phi_{w}(x).$

Therefore $\phi_{w}$ is increasing.

Now we show that $\phi_{w}(x)$ is non-expansive, by induction. If ${\left|w\right|}=1$ then this follows from A2. Else, say $\phi_{u}(x)$ is non-expansive where $w=ua$ and $a\in\{0,1\}$ . Then for any $x,y\in\mathcal{I}:x<y$ ,

$\displaystyle\phi_{w}(y)-\phi_{w}(x)$	$\displaystyle=\phi_{a}(\phi_{u}(y))-\phi_{a}(\phi_{u}(x))$
	$\displaystyle<\phi_{u}(y)-\phi_{u}(x)$	as $\phi_{u}(y)>\phi_{u}(x)$ and $\phi_{a}$ is non-expansive
	$\displaystyle<y-x$	as $\phi_{u}$ is non-expansive.

Therefore $\phi_{w}$ is non-expansive.

Let $\psi(x):=\max\{\phi_{0}(x),\phi_{1}(x)\}$ . As $\phi_{1}$ is non-expansive we have

\displaystyle y_{1}=\phi_{1}(y_{1})>\phi_{1}(y_{0})+y_{1}-y_{0}

which rearranges to give $\phi_{1}(y_{0})<y_{0}$ , so that $\psi(y_{0})=y_{0}$ . Also $\psi$ is increasing as $\phi_{0},\phi_{1}$ are increasing, so $\phi_{w}(y_{0})\leq\psi^{({\left|w\right|})}(y_{0})=y_{0}$ .

We now prove that $y_{w}$ exists. The argument of the previous paragraph shows that $g(x):=x-\phi_{w}(x)$ satisfies $g(y_{0})\geq 0$ . A symmetric argument leads to the conclusion that $g(y_{1})\leq 0$ . Clearly $g(x)$ is a continuous function, so by the intermediate value theorem, there is some $y\in[y_{0},y_{1}]$ for which $g(y)=0$ . Equivalently $y=\phi_{w}(y)$ . Therefore a fixed point $y_{w}$ exists.

To show that the fixed point is unique, suppose both $y$ and $z$ are fixed points with $y>z$ . As $\phi_{w}$ is non-expansive we have $\frac{\phi_{w}(y)-\phi_{w}(z)}{y-z}<1$ . Yet, as $\phi_{w}(y)=y,\phi_{w}(z)=z$ we have

\displaystyle\frac{\phi_{w}(y)-\phi_{w}(z)}{y-z}=1.

This is a contradiction. Therefore the fixed point is unique. ∎

Given a word $w$ , the next proposition shows when the transformation $\phi_{w}$ increases or decreases its argument and what might be deduced from such an increase or decrease.

Proposition 9.

Suppose A2 holds, $x\in\mathcal{I}$ and $w$ is any non-empty word. Then

\displaystyle x<\phi_{w}(x)\ \Leftrightarrow\ \phi_{w}(x)<y_{w}\ \Leftrightarrow\ x<y_{w}

and

\displaystyle x>\phi_{w}(x)\ \Leftrightarrow\ \phi_{w}(x)>y_{w}\ \Leftrightarrow\ x>y_{w}.

Proof.

We use Proposition 8 throughout the argument without further mention.

Say $x<y_{w}$ . As $\phi_{w}$ is increasing,

\displaystyle\phi_{w}(x)

\displaystyle<\phi_{w}(y_{w})=y_{w}

where the equality is the definition of $y_{w}$ . Also, as $\phi_{w}$ is non-expansive,

\displaystyle y_{w}

\displaystyle=\phi_{w}(y_{w})<\phi_{w}(x)+y_{w}-x

which rearranges to give $x<\phi_{w}(x)$ .

Now say $x>y_{w}$ . As above, we then have $\phi_{w}(x)>\phi_{w}(y_{w})=y_{w}$ and

\displaystyle y_{w}=\phi_{w}(y_{w})>\phi_{w}(x)+y_{w}-x

so that $x>\phi_{w}(x)$ .

The contrapositive of $x>y_{w}\Rightarrow\phi_{w}(x)>y_{w}$ is $\phi_{w}(x)\leq y_{w}\Rightarrow x\leq y_{w}$ . But if $\phi_{w}(x)\neq y_{w}$ then $x\neq y_{w}$ as $\phi_{w}$ is increasing and therefore injective. Thus $\phi_{w}(x)<y_{w}\Rightarrow x<y_{w}$ .

The contrapositive of $x>y_{w}\Rightarrow x>\phi_{w}(x)$ is $x\leq\phi_{w}(x)\Rightarrow x\leq y_{w}$ . But if $x\neq\phi_{w}(x)$ then $x\neq y_{w}$ as $y_{w}$ is a fixed point. So we can conclude that $x<\phi_{w}(x)\Rightarrow x<y_{w}$ .

By symmetry, $\phi_{w}(x)>y_{w}\Rightarrow x>y_{w}$ and $x>\phi_{w}(x)\Rightarrow x>y_{w}$ . This completes the proof. ∎

Proposition 10.

Suppose A2 holds and $\pi$ is any word satisfying ${\left|\pi\right|}_{0}{\left|\pi\right|}_{1}>0$ . Then $y_{1}<y_{\pi}<y_{0}$ .

Proof.

Say $y_{\pi}\leq y_{1}$ . As ${\left|\pi\right|}_{0}>0$ we can write $\pi=:s01^{q}$ for some $q\geq 0$ . Thus

$\displaystyle y_{\pi}=\phi_{\pi}(y_{\pi})$	$\displaystyle\leq\phi_{s01^{q}}(y_{1})$	as $\phi_{\pi}$ is increasing
	$\displaystyle=\phi_{s0}(y_{1})$	as $\phi_{\epsilon}(y_{1})=\phi_{1}(y_{1})=y_{1}$
	$\displaystyle>\phi_{s}(y_{1})$	by Proposition 9
	$\displaystyle\geq y_{1}$	by repeating the same argument if ${\left\|s\right\|}_{0}>0$ .

But this contradicts $y_{\pi}\leq y_{1}$ . Therefore $y_{\pi}>y_{1}$ .

A symmetrical argument leads to the conclusion that $y_{\pi}<y_{0}$ . ∎

Proposition 11.

If A2 holds and $n\geq 1$ then $y_{10^{n-1}}<y_{010^{n-1}}<y_{10^{n}}$ and $y_{01^{n}}<y_{101^{n-1}}<y_{01^{n-1}}.$

Proof.

As $y_{10^{n-1}}<y_{0}$ by Proposition 10 we have $\phi_{0}(y_{10^{n-1}})>y_{10^{n-1}}$ by Proposition 9 so that

\displaystyle\phi_{010^{n-1}}(y_{10^{n-1}})=\phi_{10^{n-1}}(\phi_{0}(y_{10^{n-1}}))>\phi_{10^{n-1}}(y_{10^{n-1}})=y_{10^{n-1}}

so Proposition 9 gives $y_{010^{n-1}}>y_{10^{n-1}}.$

Furthermore $y_{10^{n}}=\phi_{0}(y_{010^{n-1}})$ by definition of $y_{\pi}$ and $y_{010^{n-1}}<y_{0}$ by Proposition 10 so that $\phi_{0}(y_{010^{n-1}})>y_{010^{n-1}}$ by Proposition 9. Thus $y_{10^{n}}>y_{010^{n-1}}$ .

The proof that $y_{01^{n}}<y_{101^{n-1}}<y_{01^{n-1}}$ is symmetrical. ∎

Proposition 12.

Suppose A2 holds, $M\in\{L_{q},R_{q}:q\geq 1\}$ and $\tilde{w}$ is any word. Let $\tilde{y}_{v}$ be the fixed point of $\tilde{\phi}_{v}:=\phi_{M(v)}$ for any word $v$ and let $0w1:=M(0\tilde{w}1)$ . Then

\displaystyle\tilde{x}\in[\tilde{y}_{01\tilde{w}},\tilde{y}_{10\tilde{w}}]\ \Leftrightarrow\ x:=\phi_{0^{q}}(\tilde{x})\in[y_{01w},y_{10w}].

Proof.

Say $M=L_{q}$ . Note that

	$\displaystyle\phi_{0^{q}}(\tilde{y}_{01\tilde{w}})$	$\displaystyle=\phi_{0^{q}}(y_{L_{q}(01\tilde{w})})$	as $\tilde{y}_{v}$ is the fixed point of $\tilde{\phi}_{v}=\phi_{L_{q}(v)}$
		$\displaystyle=\phi_{0^{q}}(y_{0^{q}01L_{q}(1\tilde{w})})$	as $L_{q}(0)=0^{q}01$
		$\displaystyle=y_{01L_{q}(1\tilde{w})0^{q}}$	as $\phi_{a}(y_{ab})=y_{ba}$ for any words $a,b$
		$\displaystyle=y_{01w}$	as $0w1=L_{q}(0\tilde{w}1)=0L_{q}(1\tilde{w})0^{q}1$
and
	$\displaystyle\phi_{0^{q}}(\tilde{y}_{10\tilde{w}})$	$\displaystyle=\phi_{0^{q}}(y_{L_{q}(10\tilde{w})})$
		$\displaystyle=\phi_{0^{q}}(y_{0^{q}1L_{q}(0\tilde{w})})$
		$\displaystyle=y_{1L_{q}(0\tilde{w})0^{q}}$
		$\displaystyle=y_{10w}$	$\displaystyle\text{as $0w1=L_{q}(0\tilde{w})0^{q}1$}.$

Proposition 8 shows that $\tilde{y}_{01\tilde{w}},\tilde{y}_{10\tilde{w}}$ exist. So the above equalities show that an inverse $\phi^{(-1)}_{0^{q}}(x)$ exists for $x\in\{y_{01w},y_{10w}\}$ . As $\phi_{0^{q}}$ is increasing and continuous, we have

\displaystyle x\in[y_{01w},y_{10w}]\ \Leftrightarrow\ \tilde{x}\in[\phi^{(-1)}_{0^{q}}(y_{01w}),\phi^{(-1)}_{0^{q}}((y_{10w})]=[\tilde{y}_{01\tilde{w}},\tilde{y}_{10\tilde{w}}].

The proof for $M=R_{q}$ is symmetric. ∎

7.3 $x$ -Threshold Words

Proposition 13.

Suppose A2 holds, $\pi$ is the $x$ -threshold word and $n\geq 1$ . Then

1.

$x\leq y_{10^{n-1}}\Rightarrow{\left|\pi^{\omega}\right|}_{0^{n}}=0$
2.

$x\geq y_{010^{n-1}}\Rightarrow{\left|\pi^{\omega}\right|}_{10^{n-1}1}=0$
3.

$x\geq y_{01^{n-1}}\Rightarrow{\left|\pi^{\omega}\right|}_{1^{n}}=0$
4.

$x\leq y_{101^{n-1}}\Rightarrow{\left|\pi^{\omega}\right|}_{01^{n-1}0}=0$

Proof.

If $x\leq y_{1}$ then it follows from Proposition 9 that the $x$ -threshold word is $\pi=1$ . Likewise if $x>y_{0}$ then the $x$ -threshold word is $\pi=0$ . In these cases Claims 1 and 2 hold, so in the following we assume that $y_{1}<x\leq y_{0}$ .

Claim 1: Let $(x_{k})$ the $x$ -threshold orbit. If $(\pi^{\omega})_{k:k+n-2}=0^{n-1}$ for some $k$ , then

$\displaystyle x_{k+n-1}$	$\displaystyle=\phi_{0^{n-1}}(x_{k})$	by definition of $(x_{k})$
	$\displaystyle\geq\phi_{0^{n-1}}(\phi_{1}(x))$	as $x_{k}\geq\phi_{1}(x)$ for all $k\geq 0$ and $\phi_{0^{n-1}}$ is increasing
	$\displaystyle=\phi_{10^{n-1}}(x)$
	$\displaystyle\geq x$	if $x\leq y_{10^{n-1}}$ by Proposition 9.

But if $x_{k+n-1}\geq x$ then $\pi_{k+n-1}=1$ by definition $\pi$ . Therefore ${\left|\pi\right|}_{0^{n}}=0.$

Claim 2: Let $(x_{k})$ be the $x$ -threshold orbit. If $(\pi^{\omega})_{k:k+n-1}=10^{n-1}$ for some $k$ , then

$\displaystyle x_{k+n}$	$\displaystyle=\phi_{10^{n-1}}(x_{k})$
	$\displaystyle<\phi_{10^{n-1}}(\phi_{0}(x))$	as $x_{k}<\phi_{0}(x)$ for all $k\geq 0$ and $\phi_{10^{n-1}}$ is increasing
	$\displaystyle=\phi_{010^{n-1}}(x)$
	$\displaystyle\leq x$	if $x\geq y_{010^{n-1}}$ by Proposition 9.

But if $x_{k+n}<x$ then $(\pi^{\omega})_{k+n}=0$ . Therefore ${\left|\pi\right|}_{10^{n-1}1}=0.$

The proof of Claims 3 and 4 is symmetrical. ∎

Proposition 14.

Suppose A2 holds and $\pi$ is a $x$ -threshold word. Then

1.

${\left|\pi\right|}_{00}>0\Rightarrow\pi=L_{n}(w)$ for some word $w$ and some $n\geq 1$
2.

${\left|\pi\right|}_{11}>0\Rightarrow\pi=R_{n}(w)$ for some word $w$ and some $n\geq 1$

Proof.

First, applying Claims 1 and 3 of Proposition 13 with $n=2$ we have ${\left|\pi\right|}_{00}=0$ for $x\leq y_{10}$ and ${\left|\pi\right|}_{11}=0$ for $x\geq y_{01}$ . Furthermore $y_{10}=\phi_{0}(y_{01})>y_{01}$ by Proposition 9. Thus $\pi$ cannot contain both 00 and 11.

So, if ${\left|\pi\right|}_{00}>0$ then $\pi$ is of the form $0^{q_{1}}10^{q_{2}}1\dots$ with strings of 0s separated by individual 1s. Let $q:=\min_{k}q_{k}$ . By Propositions 11 and 13, $I_{q}:=(y_{10^{q-1}},y_{010^{q}})$ is the only set of $x$ values for which $\pi^{\omega}$ can contain $10^{q}1$ . Thus $\pi^{\omega}$ can only contain both $10^{q}1$ and $10^{q+1}1$ in the interval

\displaystyle F_{q}:=I_{q}\cap I_{q+1}=(y_{10^{q-1}},y_{010^{q}})\cap(y_{10^{q}},y_{010^{q+1}})=(y_{10^{q}},y_{010^{q}})

noting Proposition 11 gives $y_{10^{q-1}}<y_{010^{q-1}}<y_{10^{q}}<y_{010^{q}}.$

Finally, we have $F_{q}\cap F_{q^{\prime}}=\emptyset$ for $q\neq q^{\prime}$ , which also follows from Proposition 11. Thus if ${\left|\pi\right|}_{00}>0$ then $\pi$ is a concatenation of $L_{q}(0)$ and $L_{q}(1)$ . Equivalently $\pi=L_{q}(w)$ for some word $w$ and some $q\geq 1$ as in Claim 1.

The proof of Claim 2 is symmetric. ∎

Proposition 15.

Suppose A2 holds and $\pi$ is a $x$ -threshold word. Then $\pi$ is a valid word.

Proof.

There are three cases to consider: either ${\left|\pi\right|}_{00}={\left|\pi\right|}_{11}=0$ or ${\left|\pi\right|}_{00}>0$ or ${\left|\pi\right|}_{11}>0$ .

First case: The only non-empty words not containing $00$ or $11$ are $0,1,(01)^{n},(10)^{n}$ for some $n\geq 1$ . Now $x$ -threshold words start with 0 unless $x\leq y_{1}$ (in which case $\pi=1$ ) so $\pi\neq(10)^{n}$ . Further, the $x$ -threshold word was defined to be the shortest word such that such that $x_{k+1}=A_{(\pi^{\omega})_{k}}x_{k}$ so this leaves us with the options $0,1,01$ . These are all valid words.

Second case: If $\pi$ contains 00, we may write $\pi=L_{q}(w)$ for some word $w$ , by Proposition 14. Now from point $x_{k}$ on the $x$ -threshold orbit we have $\pi_{k:k+q}=0^{q+1}$ if and only if $\phi_{0^{q}}(x_{k})<x$ which corresponds to $x_{k}<\phi_{0}^{(-q)}(x)=:\tilde{x}$ . So the word $w$ corresponds to a $\tilde{x}$ -threshold orbit $(\tilde{x}_{k}:k\geq 1)$ for $\psi_{0}(x):=\phi_{0^{q+1}1}(x),\psi_{1}(x):=\phi_{0^{q}1}(x)$ . To spell it out, we have

\displaystyle\tilde{x}_{1}

\displaystyle=\psi_{1}(\tilde{x}),

\displaystyle\tilde{x}_{k+1}

\displaystyle=\psi_{w_{k}}(\tilde{x}_{k}),

\displaystyle w_{k}

\displaystyle=\begin{cases}1&\text{if $\tilde{x}_{k}\geq\tilde{x}$}\\ 0&\text{if $\tilde{x}_{k}<\tilde{x}$}\end{cases}

for

k\geq 1

and as for the original system, we define $\tilde{y}_{\pi}$ as the fixed point $\tilde{y}_{\pi}=\psi_{\pi}(\tilde{y}_{\pi})$ .

Now $\psi_{0},\psi_{1}$ are non-negative, as $\phi_{0},\phi_{1}$ are non-negative. Also $\psi_{0},\psi_{1}$ are monotonically increasing and non-expansive by Proposition 8. Further,

\displaystyle\phi_{0^{q+1}1}(y_{0^{q}1})=\phi_{0^{q}1}(\phi_{0}(y_{0^{q}1}))>\phi_{0^{q}1}(y_{0^{q}1})=y_{0^{q}1}

so that $y_{0^{q+1}1}>y_{0^{q}1}$ by Proposition 9. But by definition $\tilde{y}_{0}=y_{0^{q+1}1}$ and $\tilde{y}_{0}=y_{0^{q}1}$ , so that $\tilde{y}_{1}<\tilde{y}_{0}$ . Therefore $\psi_{0},\psi_{1}$ satisfy A2.

Third case: We prove that $\pi=R_{q}(w)$ for some positive integer $q$ and word $w$ . We also show that word $w$ is a $\hat{x}$ -threshold word for a pair of functions (say) $\chi_{0},\chi_{1}$ which satisfy A2. The argument is symmetric to the second case, so it is omitted.

In conclusion, either

1.

$\pi\in\{0,1,L_{1}(1)\}$ which are valid words
2.

$\pi=L_{q}(w)$ where $w$ is a $\tilde{x}$ -threshold word for $\psi_{0},\psi_{1}$ which satisfy Propositions 8-14 and therefore $w$ satisfies this conclusion
3.

or $\pi=R_{q}(w)$ where $w$ is a $\hat{x}$ -threshold word for $\chi_{0},\chi_{1}$ which satisfy Propositions 8-14 and therefore $w$ satisfies this conclusion.

Thus $\pi$ is a valid word. This completes the proof. ∎

The following proposition shows that all valid words are $x$ -threshold words and tells us explicitly which values of $x$ produce a given valid word. It is one of the key results of the main paper.

Proposition 16.

Suppose A2 is satisfied and $0w1$ is any valid word. Then

\displaystyle\text{$0w1$ is the $x$-threshold word}\ \Leftrightarrow\ x\in[y_{01w},y_{10w}].

Proof.

Let $V_{1}:=\{L_{q}(1),R_{q}(1):q\geq 1\},V_{n+1}:=\{L_{q}(v),R_{q}(v):v\in V_{n},q\geq 1\}$ . Note that $V_{1}$ contains $L_{q}(0)=0^{q+1}1=L_{q+1}(1)$ and $R_{q}(0)=01^{q}$ which for $q\geq 2$ equals $R_{q-1}(1)$ and for $q=1$ equals $01=L_{1}(1)$ . Thus $\cup_{n=1}^{\infty}V_{n}$ is the set of all valid words of form $0w1$ .

We use induction with hypothesis

\displaystyle H_{n}:\quad 0w1\in V_{n}\ \text{is the $x$-threshold word}\ \Leftrightarrow\ x\in[y_{01w},y_{10w}]

Base case ( $H_{1}$ ). Say $0w1=0^{q}1$ is the $x$ -threshold word. Then

$\displaystyle x$	$\displaystyle>\phi_{(10^{q})^{n}10^{q-1}}(x)$	for all $n\geq 0$
	$\displaystyle=\phi_{(010^{q-1})^{n}}(\phi_{10^{q-1}}(x))$
$\displaystyle\Rightarrow\ x$	$\displaystyle\geq\lim_{n\rightarrow\infty}\phi_{(010^{q-1})^{n}}(\phi_{10^{q-1}}(x))=y_{010^{q-1}}.$

The definition of the $x$ -threshold word also gives $x\leq\phi_{10^{q}}(x)$ . Therefore $x\geq y_{10^{q}}$ by Proposition 9. Thus if $0^{q}1$ is the $x$ -threshold word then $x\in[y_{01w},y_{10w}]$ .

Now say $x\in[y_{010^{q-1}},y_{10^{q}}]$ . Proposition 10 gives $y_{0}<x<y_{1}$ so that the $x$ -threshold orbit $(x_{k})$ is contained in $(y_{0},y_{1})$ . So Proposition 9 shows that $\phi_{0}(x_{k})>x_{k}$ and $\phi_{1}(x_{k})<x_{k}$ for all $k\geq 0$ . So to prove that the $x$ -threshold word is $0^{q}1$ we need only show that $\phi_{(10^{q})^{n}10^{q-1}}(x)<x$ and $\phi_{(10^{q})^{n}}(x)\geq x$ for all $n\geq 0$ . But if $x\geq y_{010^{q-1}}$ then for all $n\geq 0$

$\displaystyle x$	$\displaystyle\geq\phi_{(010^{q-1})^{n}}(x)$	by Proposition 9
	$\displaystyle>\phi_{(010^{q-1})^{n}}(\phi_{10^{q-1}}(x))$	as $y_{10^{q-1}}<y_{010^{q-1}}\leq x$ by Claim 3 of Proposition 11
	$\displaystyle=\phi_{(10^{q})^{n}10^{q-1}}(x).$

Also if $x\leq y_{10^{q}}$ then $\phi_{(10^{q})^{n}}(x)\geq x$ for all $n\geq 0$ by Proposition 9. Therefore for $0w1=0^{q}1$ , we have $x\in[y_{01w},y_{10w}]$ implies that $0w1$ is the $x$ -threshold word.

For $0w1=01^{q}$ , the proof that $\pi=01^{q}\Leftrightarrow x\in[y_{01w},y_{10w}]$ is symmetric, so it is omitted.

Inductive Step. Assume $0\tilde{w}1$ satisfies $H_{n}$ .

Say $0w1=L_{q}(0\tilde{w}1)$ . Let $k_{i}:={\left|L_{q}(((0\tilde{w}1)^{\omega})_{1:i-1})\right|}+1$ so $(\pi^{\omega})_{k_{i}}$ is aligned with the start of the $i^{th}$ letter of $(0\tilde{w}1)^{\omega}$ . Let $x_{k}:=\phi_{((10w)^{\omega})_{1:k}}(x),\tilde{x}_{i}:=x_{k_{i}},x=\phi_{0^{q}}(\tilde{x})$ and let $\tilde{y}_{v}$ denote the fixed point of $\tilde{\phi}_{v}:=\phi_{L_{q}(v)}$ for any word $v$ . Then we have

		$L_{q}(0\tilde{w}1)$ is the $x$ -threshold word for $\phi_{0},\phi_{1}$
	$\displaystyle\Leftrightarrow\qquad$	$((0w1)^{\omega})_{k_{i}:k_{i}+q}=0^{q+1}$ if and only if $\phi_{0^{q}}(x_{k_{i}})<x$
	$\displaystyle\Leftrightarrow\qquad$	$((0\tilde{w}1)^{\omega})_{i}=0$ if and only if $\tilde{x}_{i}<\tilde{x}$
	$\displaystyle\Leftrightarrow\qquad$	$0\tilde{w}1$ is the $\tilde{x}$ -threshold word for $\tilde{\phi}_{0},\tilde{\phi}_{1}$
	$\displaystyle\Leftrightarrow\qquad$	$\tilde{x}\in[\tilde{y}_{01\tilde{w}},\tilde{y}_{10\tilde{w}}]$ as $0\tilde{w}1$ satisfies $H_{n}$
	$\displaystyle\Leftrightarrow\qquad$	$x\in[y_{01w},y_{10w}]$ by Proposition 12

Symmetrically we may conclude that $\pi=0w1=R_{q}(0\tilde{w}1)\Leftrightarrow x\in[y_{01w},y_{10w}]$ . Therefore $H_{n+1}$ is true.

This completes the proof. ∎

8 Continuity of the Index

We showed that the Whittle index is increasing on the domain of each fixed Christoffel word. However, we also need to show that the index is continuous as we move between words. So here we prove the following proposition.

Proposition 17.

Suppose $\lambda(\cdot)$ is as in the main paper. Then $\lambda(x)$ is a continuous function of $x\in\mathbb{R}_{+}$ .

We use the following definitions.

Definition. Let $\tilde{w}$ be the reverse of word $w$ , $w^{\omega}$ be the word constructed by concatenating $w$ infinitely many times, ${\left|w\right|}$ be the length of word $w$ and ${\left|w\right|}_{u}$ be the number of times that word $u$ is a factor of $w$ .

Definition. For a possibly-infinite word $w$ and numbers $x\in\mathbb{R},\beta\in(0,1)$ define

	$\displaystyle S(w,x)$	$\displaystyle:=\sum_{n=0}^{{\left\|w\right\|}-1}\beta^{n}\phi_{w_{1:n}}(x)$
	$\displaystyle\lambda(0w1,x)$	$\displaystyle:=\frac{1-\beta^{{\left\|0w1\right\|}}}{1-\beta}\left(S((01w)^{\omega},x)-S((10w)^{\omega},x)\right).$

Remark. If $\pi$ is the $x$ -threshold word then $\lambda(x)=\lambda(\pi,x)$ where $\lambda(x)$ is the Whittle index.

Remark. For a word $ab$ , this definition gives

	$\displaystyle S(ab,x)$	$\displaystyle=S(a,x)+\beta^{{\left\|a\right\|}}S(b,\phi_{a}(x))$	(13)
so for ${\left\|\phi_{a^{\omega}}(x)\right\|}<\infty$ and $\beta\in(0,1)$ we have
	$\displaystyle S(a^{\omega}b,x)$	$\displaystyle=S(a^{\omega},x).$	(14)
Further, if $x_{a}=\phi_{a}(x_{a})$ then the formula for the sum of a geometric progression gives
	$\displaystyle S(a^{\omega},x_{a})$	$\displaystyle=\frac{S(a,x_{a})}{1-\beta^{{\left\|a\right\|}}}.$	(15)

Definition. Let $X_{\pi}$ be the range of $x$ for which the $x$ -threshold word is $\pi$ .

The following construction is closely related to the beautiful Christoffel tree (Berstel et al, 2008).

Definition. Consider the mapping $C$ which takes a sequence of words and returns a sequence containing the original words mingled with the concatenation of neighbouring words as follows:

\displaystyle C((a,b,c,d,\dots,x,y,z)):=(a,ab,b,bc,c,cd,d,\dots,x,xy,y,yz,z).

Now consider the sequences $t_{k}:=C^{(k)}((0,1))$ for $k\geq 0$ . The first few such sequences are

\displaystyle t_{0}

\displaystyle=(0,

\displaystyle 1)

\displaystyle t_{1}

\displaystyle=(0,

\displaystyle 01,

\displaystyle 1)

\displaystyle t_{2}

\displaystyle=(0,

\displaystyle 001,

\displaystyle 01,

\displaystyle 011,

\displaystyle 1)

\displaystyle t_{3}

\displaystyle=(0,

\displaystyle 0001,

\displaystyle 001,

\displaystyle 00101,

\displaystyle 01,

\displaystyle 01011,

\displaystyle 011,

\displaystyle 0111,

\displaystyle 1)

Remark. If $u\in t_{k}$ then ${\left|u\right|}\geq 1$ for any $k\geq 0$ . Now suppose $u,v$ are adjacent in $t_{k}$ and we have ${\left|uv\right|}\geq k+2$ . Then $t_{k+1}$ contains $u,uv,v$ from which we can construct $uuv$ and $uvv$ . But ${\left|uuv\right|}={\left|u\right|}+{\left|uv\right|}\geq 1+k+2=k+3$ and ${\left|uvv\right|}={\left|uv\right|}+{\left|v\right|}\geq k+2+1=k+3$ . Thus, by induction, we have shown that

\displaystyle{\left|uv\right|}

\displaystyle\geq k+2

for any adjacent pair

u,v

t_{k}

and any

k\geq 0

(16)

8.1 Long Common Prefixes

We gather the results needed to prove Proposition 17. Most of these results these relate to the notion that if ${\left|x-y\right|}$ is small and $a,b$ are the $x$ - and $y$ -threshold words, then words $a,b$ usually have a long common prefix, although this is not always the case.

The following simple result is repeatedly used in the other Lemmas of this subsection.

Lemma 1.

Suppose $(0a1,0b1)$ is a standard pair. Then $a10b=b01a$ .

Proof.

As $(0a1,0b1)$ is a standard pair, $0a10b1=:0w1$ is a Christoffel word. As $0a1,0b1,0w1$ are Christoffel words, $a,b,w$ are palindromes. Thus $a10b=w=\tilde{w}=\tilde{b}01\tilde{a}=b01a.$ ∎

If $(0a1,0b1)$ is a standard pair, then the interval $X_{0b1}$ is immediately to the left of $X_{0a1(0b1)^{\omega}}$ . Since the words $0b1$ and $0a1(0b1)^{\omega}$ can differ within the first few letters, continuity of $\lambda(x)$ at $x=\sup X_{0b1}$ is not obvious. Similarly, $X_{(0a1)^{\omega}0b1}$ is immediately to the left of $X_{0a1}$ . However, the factors $1-\beta^{{\left|(0a1)^{\omega}0b1\right|}}$ and $1-\beta^{{\left|0a1\right|}}$ appearing in the definitions of the corresponding Whittle indices are different for ${\left|a\right|}<\infty$ . Thus continuity of $\lambda(x)$ at $x=\sup X_{0a1}$ is not obvious. The next two Lemmas address these questions.

Lemma 2.

Suppose $(0a1,0b1)$ is a standard pair and let $x=\phi_{10b}(x)$ . Then

\displaystyle\lambda({0b1},x)=\lambda({0a1(0b1)^{\omega}},x).

Proof.

The right-hand side $\lambda({0a1(0b1)^{\omega}},x)$ involves the sum

$\displaystyle S(10a1(0b1)^{\omega},x)$	$\displaystyle=S(10b01a(10b)^{\omega},x)$	by Lemma 1
	$\displaystyle=S(10b,x)+\beta^{{\left\|10b\right\|}}S(01a(10b)^{\omega},\phi_{10b}(x))$	by 13
	$\displaystyle=S(10b,x)+\beta^{{\left\|10b\right\|}}S(01a(10b)^{\omega},x)$	as $x=\phi_{10b}(x)$
	$\displaystyle=(1-\beta^{{\left\|10b\right\|}})S((10b)^{\omega},x)+\beta^{{\left\|10b\right\|}}S(01a(10b)^{\omega},x)$	$\displaystyle\text{by~\ref{eq:SaOmega}}.$	(17)

Now we note that repeated application of Lemma 1 gives

\displaystyle 01a(10b)^{\omega}=01a10b(10b)^{\omega}=01b\,01a(10b)^{\omega}=(01b)^{\omega}01a.

(18)

Thus

$\displaystyle\lambda({0a1(0b1)^{\omega}},x)$	$\displaystyle=\frac{1-\beta^{{\left\|0a1(0b1)^{\omega}\right\|}}}{1-\beta}\left(S((01a1(0b1)^{\omega})^{\omega},x)-S((10a1(0b1)^{\omega})^{\omega},x)\right)$
	$\displaystyle=\frac{S(01a1(0b1)^{\omega},x)-S(10a1(0b1)^{\omega},x)}{1-\beta}$	by 14
	$\displaystyle=\frac{1-\beta^{{\left\|10b\right\|}}}{1-\beta}\left(S(01a(10b)^{\omega},x)-S((10b)^{\omega},x)\right)$	by 17
	$\displaystyle=\frac{1-\beta^{{\left\|10b\right\|}}}{1-\beta}\left(S((01b)^{\omega},x)-S((10b)^{\omega},x)\right)$	by 18
	$\displaystyle=\lambda({0b1},x).$

This completes the proof. ∎

Lemma 3.

Suppose $(0a1,0b1)$ is a standard pair and let $x=\phi_{01a}(x)$ . Then

\displaystyle\lambda({(0a1)^{\omega}0b1},x)=\lambda({0a1},x).

Proof.

The left-hand side $\lambda({(0a1)^{\omega}0b1},x)$ involves the sum

$\displaystyle S(01(a10)^{\omega}0b1,x)$	$\displaystyle=S(01(a10)^{\omega},x)$	by 14
	$\displaystyle=S(01a,x)+\beta^{{\left\|01a\right\|}}S((10a)^{\omega},\phi_{01a}(x))$	by 13
	$\displaystyle=S(01a,x)+\beta^{{\left\|01a\right\|}}S((10a)^{\omega},x)$	as $x=\phi_{01a}(x)$
	$\displaystyle=(1-\beta^{{\left\|01a\right\|}})S((01a)^{\omega},x)+\beta^{{\left\|01a\right\|}}S((10a)^{\omega},x)$	$\displaystyle\text{by~\ref{eq:SaOmega}}.$	(19)

Thus

$\displaystyle\lambda({(0a1)^{\omega}0b1},x)$	$\displaystyle=\frac{1-\beta^{{\left\|(0a1)^{\omega}0b1\right\|}}}{1-\beta}\left(S((01(a10)^{\omega}0b1)^{\omega},x)-S((10(a10)^{\omega}0b1)^{\omega},x)\right)$
	$\displaystyle=\frac{1}{1-\beta}\left(S(01(a10)^{\omega}0b1,x)-S((10a)^{\omega},x)\right)$	by 14
	$\displaystyle=\frac{1-\beta^{{\left\|01a\right\|}}}{1-\beta}(S((01a)^{\omega},x)-S((10a)^{\omega},x))$	by 19
	$\displaystyle=\lambda({0a1},x).$

This completes the proof. ∎

To demonstrate continuity at other points, we will need to rely on the fact that nearby words often have a long common prefix as shown by the following two Lemmas.

Lemma 4.

Suppose $(0a1,0b1)$ is a subsequence of $t_{k}$ for some $k\geq 1$ . Then $0b01a$ is a prefix of both $(0a1)^{\omega}$ and $0b(01b)^{\omega}$ .

Proof.

Let $a=b\cdots$ indicate that $b$ is a prefix of word $a$ and consider the statements

\displaystyle A(a,b):(a10)^{\omega}=b\cdots\qquad\text{and}\qquad B(a,b):(b01)^{\omega}=a\cdots.

It suffices to show that $A(a,b)$ and $B(a,b)$ are true for any adjacent words $0a1,0b1$ in $t_{k}$ for $k\geq 0$ . This is because

	$\displaystyle A(a,b)$	$\displaystyle\Rightarrow(0a1)^{\omega}=0a10(a10)^{\omega}=0a10b\cdots=0b01a\cdots$
where the last equality follows from Lemma 1 and
	$\displaystyle B(a,b)$	$\displaystyle\Rightarrow 0b(01b)^{\omega}=0b01(b01)^{\omega}=0b01a\cdots$

which are the claims of the Lemma.

We shall use induction. Take $t_{2}=(0,001,01,011,01)$ as the base case. We must show that $A(0,\epsilon),B(0,\epsilon),A(\epsilon,1),B(\epsilon,1)$ are true. However these statements are respectively that $(001)^{\omega}=\epsilon\cdots,(01)^{\omega}=0\cdots,(10)^{\omega}=1\cdots,(101)^{\omega}=\epsilon\cdots$ and are all true.

Otherwise, say $A(a,b),B(a,b)$ are true for any adjacency $0a1,0b1$ in $t_{k}$ . Let $0a10b1=0c1$ so

\displaystyle c=a10b=b01a

using Lemma 1 again. Then the statements $A(a,c),B(a,c),A(c,b),B(c,b)$ are all true as

$\displaystyle(a10)^{\omega}$	$\displaystyle=a10(a10)^{\omega}=a10b\cdots=c\cdots$	by $A(a,b)$ and as $c=a10b$
$\displaystyle(c01)^{\omega}$	$\displaystyle=c\cdots=a\cdots$	as $c=a10b$
$\displaystyle(c10)^{\omega}$	$\displaystyle=c\cdots=b\cdots$	as $c=b01a$
$\displaystyle(b01)^{\omega}$	$\displaystyle=b01(b01)^{\omega}=b01a\cdots=c\cdots$	by $B(a,b)$ and as $c=b01a$ .

Thus $A(a,b),B(a,b)$ are true for all adjacencies $0a1,0b1$ in $t_{k+1}$ . This completes the proof. ∎

Lemma 5.

Suppose $0a1,0b1$ are adjacent in $t_{k}$ and that $0c1$ lies strictly between them in $t_{k^{\prime}}$ for some $0<k<k^{\prime}$ . Then $0c1=0b01a\cdots$ .

Proof.

The interval of $t_{k^{\prime}}$ between $0a1,0b1$ is constructed from $0a1,0b1$ in exactly the same way as $t_{k^{\prime}-k}$ was constructed from $0,1$ . Thus $0c1=(0a1)^{q}0b1\cdots$ for some positive integer $q$ . Now recall that $0b01a=0a10b$ by Lemma 1. Thus $0c1=(0a1)^{q-1}0a10b1\cdots=(0a1)^{q-1}0b01a1\cdots=0b(01a)^{q}1\cdots=0b01a\cdots$ as claimed. ∎

Although the existence of a long common prefix for nearby words suggests continuity, to prove anything we must bound the residual after removing the long common prefix. The following Lemma is one way to achieve this.

Lemma 6.

Suppose $x\geq y\geq 0$ , let $0w1$ be the $x$ -threshold word and let $(01w)^{\omega}=su,(10w)^{\omega}=s^{\prime}u^{\prime}$ where ${\left|s\right|}={\left|s^{\prime}\right|}$ . Then ${\left|S(u,\phi_{s}(y))-S(u^{\prime},\phi_{s^{\prime}}(y))\right|}\leq\frac{x+1}{1-\beta}.$

Proof.

The highest point on the orbits $(\phi_{((01w)^{\omega})_{1:k}}(x):k\geq 0)$ and $(\phi_{((10w)^{\omega})_{1:k}}(x):k\geq 0)$ is $x+1$ since $0w1$ is the $x$ -threshold word. The terms $a_{k},b_{k}$ of the discounted sums

S(u,\phi_{s}(y))=:\sum_{k=0}^{\infty}\beta^{k}a_{k}

and

S(u^{\prime},\phi_{s^{\prime}}(y))=:\sum_{k=0}^{\infty}\beta^{k}b_{k}

are from the orbits $(\phi_{((01w)^{\omega})_{1:k}}(y):k\geq 0)$ and $(\phi_{((10w)^{\omega})_{1:k}}(y):k\geq 0)$ and $\phi_{u^{\prime\prime}}(x)\geq\phi_{u^{\prime\prime}}(y)$ for any word $u^{\prime\prime}$ as $x\geq y$ . Therefore terms $a_{k},b_{k}$ , are also no higher than $\phi_{0}(x)\leq x+1$ . Furthermore, terms $a_{k},b_{k}$ are non-negative, so that ${\left|a_{k}-b_{k}\right|}\leq x+1$ . Thus ${\left|S(u,\phi_{s}(y))-S(u^{\prime},\phi_{s^{\prime}}(y))\right|}\leq\sum_{k=0}^{\infty}\beta^{k}{\left|a_{k}-b_{k}\right|}\leq\sum_{k=0}^{\infty}\beta^{k}(x+1)=\frac{x+1}{1-\beta}.$ ∎

Although it is clear that $\lambda(\pi,x)$ is continuous, a bound on its slope is helpful.

Lemma 7.

Suppose $x\geq 0$ and that $0w1$ is a valid word. Then ${\left|\lambda^{\prime}(0w1,x)\right|}\leq\frac{1}{(1-\beta)^{2}}$ .

Proof.

The definition of $\lambda(0w1,x)$ gives

\displaystyle{\left|\lambda^{\prime}(0w1,x)\right|}\leq\frac{1-\beta^{{\left|0w1\right|}}}{1-\beta}\sum_{k=0}^{\infty}\beta^{k}{\left|\phi_{((01w)^{\omega})_{1:k}}^{\prime}(x)-\phi_{((10w)^{\omega})_{1:k}}^{\prime}(x)\right|}\leq\frac{1}{1-\beta}\sum_{k=0}^{\infty}\beta^{k}=\frac{1}{(1-\beta)^{2}}

where the second inequality follows as $0\leq\beta^{{\left|0w1\right|}}<1$ and $0\leq\phi_{u}^{\prime}(x)\leq 1$ for any word $u$ since $0\leq\phi_{1}^{\prime}(x)\leq\phi_{0}^{\prime}(x)\leq 1$ . ∎

We use one more result about $\phi_{0},\phi_{1}$ of the main paper.

Lemma 8.

Suppose $\phi_{0}(x)$ and $\phi_{1}(x)$ are as in the main paper and $x\in\mathbb{R}_{+}$ . Then $\phi_{01}(x)<\phi_{10}(x)$ .

Proof.

The definitions of $\phi_{0},\phi_{1}$ give

	$\displaystyle\phi_{10}(x)-\phi_{01}(x)=$
	$\displaystyle\quad(b-a)\frac{(ab+b+a)x^{2}+(2ab+3b+3a+2)x+ab+2b+2a+3}{((ab+b+a)x+ab+b+2a+1)((ab+b+a)x+ab+2b+a+1)}$

which is positive as $b>a$ and $x\geq 0$ . ∎

Our proof of continuity will rely on the standard $(\epsilon,\delta)$ definition in which we will put $\delta=l_{k}$ where $l_{k}$ is defined in the following Lemma.

Lemma 9.

For any $\epsilon>0$ there is a $k<\infty$ such that $0<l_{k}:=\inf\{{\left|X_{\pi}\right|}:\pi\in t_{k}\}<\epsilon.$

Proof.

Say $0a1,0b1$ are adjacent in $t_{k}$ . Then by construction of $t_{k+i}$ , the gap $(z_{10b},z_{01a})$ contains $2^{i}-1$ intervals corresponding to words of $t_{k+i}\backslash t_{k}$ . Each of these intervals is at most $\frac{z_{01a}-z_{10b}}{2^{i}-1}$ in length. Thus $\lim_{k\rightarrow\infty}l_{k}=0$ . This demonstrates the existence of a $k<\infty$ such that $l_{k}<\epsilon$ .

To show that $l_{k}>0$ for finite $k$ , we shall demonstrate that assuming $l_{k}=0$ leads to a contradiction. If $l_{k}=0$ then there is some word $0w1\in t_{k}$ such that $z_{10w}=z_{01w}=:x$ . Therefore $\phi_{10w}(x)=\phi_{01w}(x)$ . Now in $\mathbb{R}_{+}$ , functions $\phi_{0}(x),\phi_{1}(x)$ have inverses, so $\phi_{w}^{-1}(x)$ is well-defined. Therefore

\displaystyle\phi_{10}(x)=\phi_{w}^{-1}\circ\phi_{10w}(x)=\phi_{w}^{-1}\circ\phi_{01w}(x)=\phi_{01}(x)

which contradicts Lemma 8 as $x\geq 0$ . ∎

8.2 Proof of Continuity

Proof.

We wish to show that for any $\epsilon>0$ , there exists a $\delta>0$ such that for any ${\left|x-y\right|}<\delta$ we have $\Delta:={\left|\lambda(x)-\lambda(y)\right|}<\epsilon$ . Without loss of generality we assume that $x\geq y$ .

Specifically, we shall put $\delta=l_{k}>0$ where $l_{k}$ is as defined in Lemma 9 and $k$ is any positive integer such that $\frac{l_{k}}{(1-\beta)^{2}}<\frac{\epsilon}{2}$ and such that $2\frac{x+1}{(1-\beta)^{2}}\beta^{k+1}<\frac{\epsilon}{2}$ . The existence of such a $k$ is guaranteed by Lemma 9 and because $\beta\in(0,1)$ .

Let $0a1,0b1$ be the $x$ - and $y$ -threshold words. If these words are the same then

\displaystyle\Delta={\left|\lambda(0a1,x)-\lambda(0a1,y)\right|}\leq{\left|y-x\right|}\sup_{z\in[x,y]}{\left|\lambda^{\prime}(0a1,z)\right|}\leq\frac{{\left|y-x\right|}}{(1-\beta)^{2}}\leq\frac{l_{k}}{(1-\beta)^{2}}<\frac{\epsilon}{2}

where the second inequality follows from Lemma 7, the third from ${\left|y-x\right|}<\delta=l_{k}$ and the fourth from the definition of $k$ .

Otherwise $0a1\neq 0b1$ . In this case, let $(0e1,0b1)$ be the standard pair for word $0b1$ , let $\underline{a}=\phi_{10a}(\underline{a})$ and $\bar{b}=\phi_{01b}(\bar{b})$ . Noting that $y\leq\bar{b}\leq\underline{a}\leq x$ , our strategy is to write

	$\displaystyle\Delta$	$\displaystyle={\left\|\Delta_{1}+\Delta_{2}+\Delta_{3}+\Delta_{4}+\Delta_{5}+\Delta_{6}\right\|}$
	$\displaystyle\Delta_{1}$	$\displaystyle:=\lambda(0b1,y)-\lambda(0b1,\bar{b})$
	$\displaystyle\Delta_{2}$	$\displaystyle:=\lambda(0b1,\bar{b})-\lambda(0e1(0b1)^{\omega},\bar{b})$
	$\displaystyle\Delta_{3}$	$\displaystyle:=\lambda(0e1(0b1)^{\omega},\bar{b})-\lambda((0a1)^{\omega},\bar{b})$
	$\displaystyle\Delta_{4}$	$\displaystyle:=\lambda((0a1)^{\omega},\bar{b})-\lambda((0a1)^{\omega},\underline{a})$
	$\displaystyle\Delta_{5}$	$\displaystyle:=\lambda((0a1)^{\omega},\underline{a})-\lambda(0a1,\underline{a})$
	$\displaystyle\Delta_{6}$	$\displaystyle:=\lambda(0a1,\underline{a})-\lambda(0a1,x).$

Lemma 7 and the choice of $\delta$ give

\displaystyle{\left|\Delta_{1}\right|}+{\left|\Delta_{4}\right|}+{\left|\Delta_{6}\right|}\leq\frac{\bar{b}-y+\underline{a}-\bar{b}+x-\underline{a}}{(1-\beta)^{2}}<\frac{l_{k}}{(1-\beta)^{2}}\leq\frac{\epsilon}{2}

(20)

while Lemmas 2 and 3 give

\displaystyle\Delta_{2}=\Delta_{5}=0.

(21)

It remains to consider $\Delta_{3}$ . It follows from the definition of $l_{k}$ , that for some adjacent words $0c1,0d1$ in $t_{k}$ : either $0a1=0c1$ or $0a1$ is a word strictly between $0c1$ and $0d1$ in the sense of Lemma 5; and that $0e1(0b1)^{\omega}$ is a word strictly between $0c1$ and $0d1$ . Thus by Lemma 5 we have $(0a1)^{\omega}=0pu$ and $0e1(0b1)^{\omega}=0pv$ where $p:=d01c$ and $u,v$ are the appropriate suffixes. Therefore the definition of $\lambda(w,x)$ gives

$\displaystyle{\left\|\Delta_{3}\right\|}$	$\displaystyle={\left\|\lambda((0a1)^{\omega},\bar{b})-\lambda(0d1(0b1)^{\omega},\bar{b})\right\|}$
	$\displaystyle=\frac{1}{1-\beta}\begin{vmatrix}S(01p,\bar{b})+\beta^{{\left\|01p\right\|}}S(u,\phi_{01p}(\bar{b}))-S(10p,\bar{b})-\beta^{{\left\|10p\right\|}}S(u,\phi_{10p}(\bar{b}))\\ -S(01p,\bar{b})-\beta^{{\left\|01p\right\|}}S(v,\phi_{01p}(\bar{b}))+S(10p,\bar{b})+\beta^{{\left\|01p\right\|}}S(v,\phi_{10p}(\bar{b}))\end{vmatrix}$
	$\displaystyle=\frac{\beta^{{\left\|01p\right\|}}}{1-\beta}{\left\|S(u,\phi_{01p}(\bar{b}))-S(u,\phi_{10p}(\bar{b}))-S(v,\phi_{01p}(\bar{b}))+S(v,\phi_{10p}(\bar{b}))\right\|}$
	$\displaystyle\leq\frac{\beta^{{\left\|01p\right\|}}}{1-\beta}\left({\left\|S(u,\phi_{01p}(\bar{b}))-S(u,\phi_{10p}(\bar{b}))\right\|}+{\left\|S(v,\phi_{01p}(\bar{b}))-S(v,\phi_{10p}(\bar{b}))\right\|}\right)$
	$\displaystyle\leq\frac{\beta^{{\left\|01p\right\|}}}{1-\beta}\left(\frac{\underline{a}+1}{1-\beta}+\frac{\bar{b}+1}{1-\beta}\right)$
	$\displaystyle\leq\frac{\beta^{k+1}}{(1-\beta)^{2}}2(x+1)$
	$\displaystyle<\frac{\epsilon}{2}$	(22)

where the last four inequalities follow from the triangle inequality, from Lemma 6, from equation 16 coupled with the fact that $\underline{a}\leq\bar{b}\leq x$ and finally from the definition of $k$ .

Finally, coupling 20, 21 and 22 and using the triangle inequality gives

\displaystyle\Delta

\displaystyle<\frac{\epsilon}{2}+0+\frac{\epsilon}{2}=\epsilon.

This completes the proof. ∎

9 Properties of the Linear-System Orbits $M(w)$

Recall the definitions about words from the main paper, particularly that $\tilde{w}$ is the reverse of $w$ . Also, recall the definitions of matrices $F,G,K,M(w)$ . The first of the following propositions is used to prove the second. The second appears in the main paper.

Proposition 18.

Suppose $w,w^{\prime}$ are any words. Then

1.

$\det(M(w))=1,$
2.

$M(\tilde{w})=KM(w)^{-1}K$ ,
3.

$M(w)=\begin{pmatrix}e&f\\ \frac{eh-1}{f}&h\end{pmatrix}$ for some $e,f,h\in\mathbb{R}$ ,
4.

$M(w)-M(\tilde{w})=\lambda K$ for some $\lambda\in\mathbb{R}$ ,
5.

$\displaystyle\frac{[M(w01w^{\prime})]_{22}}{[M(w01w^{\prime})]_{21}}\geq\frac{[M(w10w^{\prime})]_{22}}{[M(w10w^{\prime})]_{21}}$ ,
6.

$[M(w)]_{22}\geq[M(w)]_{21}$ .

Proof.

$\det(M(w))=\prod_{i=1}^{{\left|w\right|}}\det(M(w_{i}))=1$ as $\det(F)=\det(G)=1$ gives Claim 1.
Claim 2. The definitions of $F,G,K$ give $KF=F^{-1}K,KG=G^{-1}K$ . Thus $KM(w)=M(w_{{\left|w\right|}})^{-1}\cdots M(w_{1})^{-1}K=M(\tilde{w})^{-1}K$ . The result follows as $K^{2}=I$ .
Claim 3. Put $M(w)=:\begin{pmatrix}e&f\\ g&h\end{pmatrix}$ and solve $\det(M(w))=1=eg-hf$ for $g$ .
Claim 4. Substituting Claim 2 and Claim 3 in Claim 4 gives $M(w)-KM(w)^{-1}K=(h-e-g)K.$
Claim 5. Put $M:=M(w),N:=M(w^{\prime})$ . We calculate

	$\displaystyle[NGFM]_{22}[NFGM]_{21}-[NGFM]_{21}[NFGM]_{22}$
	$\displaystyle\quad=(b-a)(M_{11}M_{22}-M_{12}M_{21})((ab+b+a)N_{22}^{2}+(b+a+2)N_{21}N_{22}+N_{21}^{2})\geq 0$

as $b>a\geq 0$ , $\det(M)=1$ and $N\geq 0$ . The result follows as $NFGM\geq 0$ and $NGFM\geq 0$ .
Claim 6. If $w=\epsilon$ then $[M(w)]_{22}-[M(w)]_{21}=1\geq 0$ . Otherwise we use induction on ${\left|w\right|}$ to show that $M(w)v\geq 0$ where $v:=(-1,1)^{T}$ . In the base case $w\in\{0,1\}$ so

\displaystyle M(w)v=\begin{pmatrix}1&1\\ c&1+c\end{pmatrix}\begin{pmatrix}-1\\ 1\end{pmatrix}=\begin{pmatrix}0\\ 1\end{pmatrix}\geq 0\qquad\text{for some $c\in\{a,b\}$}.

For the inductive step, assume $w=\{0u,1u\}$ for some word $u$ satisfying $M(u)v\geq 0$ . Then

\displaystyle M(w)v=\begin{pmatrix}1&1\\ c&1+c\end{pmatrix}M(u)v\geq 0\qquad\text{for some $c\in\{a,b\}$.}

As $[M(w)v]_{2}=[M(w)]_{22}-[M(w)]_{21}$ , this completes the proof. ∎

Proposition 19.

Suppose $w$ is a word, $p$ is a palindrome and $n\geq\mathbb{Z}_{+}$ . Then

1.

$M(p)=\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix}$ for some $f,h\in\mathbb{R}$ ,
2.

$\text{tr}(M(10p))=\text{tr}(M(01p))$ ,
3.

If $u\in\{p(10p)^{n},(10p)^{n}10\}$ then $M(u)-M(\tilde{u})=\lambda K$ for some $\lambda\in\mathbb{R}_{-}$ ,
4.

If $w$ is a prefix of $p$ then $[M(p(10p)^{n}10w)]_{22}\leq[M(p(01p)^{n}01w)]_{22}$ ,
5.

$[M((10p)^{n}10w)]_{21}\geq[M((01p)^{n}01w)]_{21}$ ,
6.

$[M((10p)^{n}1)]_{21}\geq[M((01p)^{n}0)]_{21}$ .

Proof.

In this proof, we refer to Claim $k$ of Proposition 18 as P $k$ .
Claim 1. P2 gives $M(p)=KM(p)^{-1}K$ as $p=\tilde{p}$ . But in the notation of P3, $[M(p)]_{11}=[KM(p)^{-1}K]_{11}$ says $e=h-(eh-1)/f$ . Solve this for $e$ and substitute in P3.
Claim 2. Noting that $GF-FG=(b-a)K$ , the notation of Claim 1 gives

\displaystyle\text{tr}(M(01p))-\text{tr}(M(10p))=\text{tr}(M(p)(GF-GF))=(b-a)\text{tr}\left(\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix}K\right)=0.

Claim 3. Note we can move from $u$ to $\tilde{u}$ just by swapping some $10$ for $01$ . So, repeated application of P5 gives the inequality $\frac{[M(u)]_{22}}{[M(u)]_{21}}\leq\frac{[M(\tilde{u})]_{22}}{[M(\tilde{u})]_{21}}.$ But the denominators of this inequality are equal (and non-negative) as P4 gives $[M(u)]_{21}-[M(\tilde{u})]_{21}=\lambda^{\prime}K_{21}=0$ for some $\lambda^{\prime}\in\mathbb{R}$ . Thus this inequality reduces to $[M(u)]_{22}\leq[M(\tilde{u})]_{22}$ . Yet P4 also gives $[M(u)-M(\tilde{u})]_{22}=\lambda K_{22}$ which combined with the previous sentence says that $\lambda K_{22}\leq 0$ . As $K_{22}=1$ , this gives $\lambda\in\mathbb{R}_{-}$ .
Claim 4. Let $s$ be the corresponding suffix so $p=ws$ and

M(p(10p)^{n}10w)-M(p(01p)^{n}01w)=M(s)^{-1}(M(p(10p)^{n+1})-M(p(01p)^{n+1}))=:A.

But Claim 3 with $u=p(10p)^{n+1}$ gives

\displaystyle[A]_{22}

\displaystyle=\underbrace{\lambda[M(s)^{-1}K]_{22}}_{\text{for some $\lambda\leq 0$}}=\underbrace{[KM(\tilde{s})]_{22}}_{\text{by P2}}=\lambda([M(\tilde{s})]_{22}-[M(\tilde{s})]_{21})\leq{\underbrace{0}}_{\text{by P6.}}

Claim 5. As $M(w)\geq 0$ , Claim 3 with $u=(10p)^{n}10$ gives

\displaystyle[M(w)(M((10p)^{n}10)-M((01p)^{n}01))]_{21}=\lambda[M(w)K]_{21}=\lambda[-M(w)]_{21}\geq 0.

Claim 6. Let $E:=\begin{pmatrix}0&0\\ 1&1\end{pmatrix}.$ Then $G-F=(b-a)E\geq 0$ , so that

	$\displaystyle[GM((10p)^{n})-FM((01p)^{n})]_{21}$	$\displaystyle=[(b-a)EM((10p)^{n})+FM((10p)^{n})-FM((01p)^{n})]_{21}$
		$\displaystyle\geq[M((10p)^{n}0)-M((01p)^{n}0)]_{21}\geq 0$

by Claim 5. This completes the proof. ∎

10 Majorisation

In the main paper, we used one result about majorisation which was similar-but-not-identical to any results in Marshall, Olson and Arnold (2011). Let us prove that result.

Proposition 20.

Proof.

As the claim relates to $x_{(i)}$ and $y_{(i)}$ we assume that $x_{i}$ and $y_{i}$ are in ascending order.

Marshall et al (3H2B, page 133) says that if $g:\mathcal{A}\rightarrow\mathbb{R}$ is a non-decreasing and convex function on $\mathcal{A}\subseteq\mathbb{R}$ and $(u_{1},\dots,u_{m})$ is a non-increasing and non-negative sequence, then for all non-increasing sequences $(p_{1},\dots,p_{m})$ the function $\phi(a):=\sum_{i=1}^{m}u_{i}g(p_{i})$ is Schur-convex.

Indeed the function $f$ is increasing and convex for $p\in\mathbb{R}_{-}$ (such as $p=-x$ and $p=-y$ ) and $(\beta,\dots,\beta^{m})$ is a non-increasing and non-negative sequence for $\beta\in[0,1]$ . Thus for all non-increasing sequences $(p_{1},\dots,p_{m})$ on $\mathbb{R}_{-}^{m}$ the function $\psi(p):=\sum_{i=1}^{m}\beta^{i}f(p_{i})$ is Schur-convex.

Recall (ibid, page 12) that $a\in\mathbb{R}^{m}$ is said to be weakly submajorised by $b\in\mathbb{R}^{m}$ , written $a\prec_{w}b$ if

\displaystyle\sum_{i=1}^{k}a_{[i]}\leq\sum_{i=1}^{k}b_{[i]},\quad k=1,\dots,m\qquad\text{where $a_{[i]}$ denotes $a$ in descending order}

and that $x\prec_{w}y\Leftrightarrow-a\prec^{w}-b$ (ibid, page 13).

However (ibid, 3A8, page 87) if $\phi(p)$ is a real function on $\mathcal{A}\subset\mathbb{R}^{m}$ which is non-decreasing in each argument $p_{i}$ and Schur-convex on $\mathcal{A}$ and $p\prec_{w}q$ on $\mathcal{A}$ then $\phi(p)\leq\phi(q)$ .

Indeed, the function $\psi(p)=\sum_{i=1}^{m}\beta^{i}f(p_{i})$ is a real function on $\mathbb{R}_{-}^{m}$ which is non-decreasing in each argument and Schur-convex on $\mathbb{R}_{-}^{m}$ for all non-increasing sequences $(p_{1},\dots,p_{m})$ . Furthermore, $-y\prec_{w}-x$ as $x\prec^{w}y$ . Therefore $\psi(y)=\psi(-y)\leq\psi(-x)=\psi(x)$ as claimed. ∎

11 Clarification of Theorem 1 for $0\leq x\leq y_{1}$ or $y_{0}\leq x<\infty$

Recall the following definitions and assumption from the main paper

\displaystyle F

\displaystyle:=\begin{pmatrix}1&1\\ a&1+a\end{pmatrix},

\displaystyle G

\displaystyle:=\begin{pmatrix}1&1\\ b&1+b\end{pmatrix},

\displaystyle E

\displaystyle:=\begin{pmatrix}0&0\\ 1&1\end{pmatrix},

\displaystyle v(x)

\displaystyle:=\begin{pmatrix}z\\ 1\end{pmatrix},

\displaystyle b>a\geq 0.

If $0\leq x\leq y_{1}$ or $y_{0}\leq x<\infty$ then the relevant linear systems, (9) in the main paper, are

\displaystyle\left.\begin{aligned} (M(1^{k+1})-M(01^{k}))v(x)&=(G-F)G^{k}v(x)=(b-a)EG^{k}v(x)\geq 0\\ (M(10^{k})-M(0^{k+1}))v(x)&=(G-F)F^{k}v(x)=(b-a)EF^{k}v(x)\geq 0\end{aligned}\right\}\quad\text{for $k\in Z_{+}$}

where both inequalities follow as $E,F,G$ are all $\geq 0$ , as $b>a$ and as $x\geq\min\{y_{0},y_{1}\}\geq 0.$ Therefore all cumulative sums of the above expressions are non-negative so the derivative of the numerator of the Whittle index is non-negative by the same weak-supermajorisation argument as in the main paper.

Meanwhile, the denominator of the index in these cases is

\displaystyle\sum_{k=0}^{\infty}\beta^{k}((1^{\omega})_{k+1}-(01^{\omega})_{k+1})=\beta=\sum_{k=0}^{\infty}\beta^{k}((10^{\omega})_{k+1}-(0^{\omega})_{k+1})

which is non-negative. Therefore the rest of the proof of Theorem 1 follows as in the main paper.

In fact we could say that the majorisation point, which is $\phi_{w}(0)$ for words $0w1$ in the main paper, is $-1$ in both cases. Indeed, Claim 6 of Proposition 4 of the main paper says that $Fv(-1)=Gv(-1)=v(0)$ . Also, $Ev(-1)=(0,0)^{T}.$ Thus for all $k\in\mathbb{Z}_{+}$ , $EG^{k}v(-1)\geq EF^{k}v(-1)\geq 0$ whereas $Ev(-1-\epsilon)<0$ for any $\epsilon>0.$

$\displaystyle{\left\|\Delta_{3}\right\|}$	$\displaystyle={\left\|\lambda((0a1)^{\omega},\bar{b})-\lambda(0d1(0b1)^{\omega},\bar{b})\right\|}$
	$\displaystyle=\frac{1}{1-\beta}\begin{vmatrix}S(01p,\bar{b})+\beta^{{\left\|01p\right\|}}S(u,\phi_{01p}(\bar{b}))-S(10p,\bar{b})-\beta^{{\left\|10p\right\|}}S(u,\phi_{10p}(\bar{b}))\\ -S(01p,\bar{b})-\beta^{{\left\|01p\right\|}}S(v,\phi_{01p}(\bar{b}))+S(10p,\bar{b})+\beta^{{\left\|01p\right\|}}S(v,\phi_{10p}(\bar{b}))\end{vmatrix}$
	$\displaystyle=\frac{\beta^{{\left\|01p\right\|}}}{1-\beta}{\left\|S(u,\phi_{01p}(\bar{b}))-S(u,\phi_{10p}(\bar{b}))-S(v,\phi_{01p}(\bar{b}))+S(v,\phi_{10p}(\bar{b}))\right\|}$
	$\displaystyle\leq\frac{\beta^{{\left\|01p\right\|}}}{1-\beta}\left({\left\|S(u,\phi_{01p}(\bar{b}))-S(u,\phi_{10p}(\bar{b}))\right\|}+{\left\|S(v,\phi_{01p}(\bar{b}))-S(v,\phi_{10p}(\bar{b}))\right\|}\right)$
	$\displaystyle\leq\frac{\beta^{{\left\|01p\right\|}}}{1-\beta}\left(\frac{\underline{a}+1}{1-\beta}+\frac{\bar{b}+1}{1-\beta}\right)$
	$\displaystyle\leq\frac{\beta^{k+1}}{(1-\beta)^{2}}2(x+1)$
	$\displaystyle<\frac{\epsilon}{2}$	(22)

When are Kalman-Filter Restless Bandits Indexable?

Abstract

1 Introduction

2 Problem and Index

3 Main Result, Key Concepts and Intuition

4 Proof of Main Result

4.1 Mechanical Words

Proposition 1.

Proposition 2.

Proposition 3.

4.2 Properties of the Linear-System Orbits M​(w)M(w) and Prefix Sums S​(w)S(w)

Proposition 4.

Proposition 5.

Proof.

4.3 Majorisation

Proposition 6.

Proposition 7.

Proof.

4.4 Indexability

Theorem 1.

Proof.

5 Further Work

References

6 Supplementary Material: Introduction

7 From xx-Threshold Policies to Mechanical Words

7.1 Definitions

7.2 Fixed Points

Proposition 8.

Proof.

Proposition 9.

Proof.

Proposition 10.

Proof.

Proposition 11.

Proof.

Proposition 12.

Proof.

7.3 xx-Threshold Words

Proposition 13.

Proof.

Proposition 14.

Proof.

Proposition 15.

Proof.

Proposition 16.

Proof.

8 Continuity of the Index

Proposition 17.

8.1 Long Common Prefixes

Lemma 1.

Proof.

Lemma 2.

Proof.

Lemma 3.

Proof.

Lemma 4.

Proof.

Lemma 5.

Proof.

Lemma 6.

Proof.

Lemma 7.

Proof.

Lemma 8.

Proof.

Lemma 9.

Proof.

8.2 Proof of Continuity

Proof.

9 Properties of the Linear-System Orbits M​(w)M(w)

Proposition 18.

Proof.

Proposition 19.

Proof.

10 Majorisation

Proposition 20.

Proof.

11 Clarification of Theorem 1 for 0≤x≤y10\leq x\leq y_{1} or y0≤x<∞y_{0}\leq x<\infty

4.2 Properties of the Linear-System Orbits $M(w)$ and Prefix Sums $S(w)$

7 From $x$ -Threshold Policies to Mechanical Words

7.3 $x$ -Threshold Words

9 Properties of the Linear-System Orbits $M(w)$

11 Clarification of Theorem 1 for $0\leq x\leq y_{1}$ or $y_{0}\leq x<\infty$