This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

When are Kalman-Filter Restless Bandits Indexable?

Christopher Dance and Tomi Silander
Xerox Research Centre Europe, Grenoble, France
(May, 2015)
Abstract

We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the Whittle index is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about Schur-convexity and mechanical words, which are particular binary strings intimately related to palindromes.

1 Introduction

We study the problem of monitoring several time series so as to maintain a precise belief while minimising the cost of sensing. Such problems can be viewed as POMDPs with belief-dependent rewards [3] and their applications include active sensing [7], attention mechanisms for multiple-object tracking [22], as well as online summarisation of massive data from time-series [4]. Specifically, we discuss the restless bandit [24] associated with the discrete-time Kalman filter [19].

Restless bandits generalise bandit problems [6, 8] to situations where the state of each arm (project, site or target) continues to change even if the arm is not played. As with bandit problems, the states of the arms evolve independently given the actions taken, suggesting that there might be efficient algorithms for large-scale settings, based on calculating an index for each arm, which is a real number associated with the (belief-)state of that arm alone. However, while bandits always have an optimal index policy (select the arm with the largest index), it is known that no index policy can be optimal for some discrete-state restless bandits [17] and such problems are in general PSPACE-hard even to approximate to any non-trivial factor [10]. Further, in this paper we address restless bandits with real-valued rather than discrete states.

On the other hand, Whittle proposed a natural index policy for restless bandits [24], but this policy only makes sense when the restless bandit is indexable, as we now explain. Say we have nn restless bandits and we are constrained to play mm arms at each time. Whittle considered relaxing this constraint by only requiring that the time-average number of arms played is mm. Now the optimal average cost for this relaxed problem is a lower bound on the optimal average cost for the original problem. Also, the relaxed problem can be separated into nn single-arm problems by the method of Lagrange multipliers, making it relatively easy to solve. In this separated version of the relaxed problem, each arm behaves identically to an arm in the original problem, except that an additional price λ\lambda is charged each time the arm is played, where λ\lambda corresponds to the Lagrange multiplier for the relaxed constraint. Now let us consider a family of optimal policies which achieves the optimal cost-to-go Qi(x,u;λ)Q_{i}(x,u;\lambda) for a single arm ii with price λ\lambda and which takes actions u=πi(x;λ)u=\pi_{i}(x;\lambda) when in state xx where u=0u=0 means passive and u=1u=1 means active. At first glance, we might intuitively suppose that it becomes less and less attractive to be active as the price λ\lambda increases so that as the price is increased beyond some value λi(x)\lambda_{i}(x), the optimal action switches from active to passive. At this price we are ambivalent between being active and passive so that Qi(x,0;λi(x))=Qi(x,1;λi(x))Q_{i}(x,0;\lambda_{i}(x))=Q_{i}(x,1;\lambda_{i}(x)). Such a value λi(x)\lambda_{i}(x) is called the Whittle index for arm ii in state xx. Indeed if there is a family of optimal policies for which

πi(x;λhi)πi(x;λlo)for all states x and all pairs of prices λhiλlo\displaystyle\pi_{i}(x;\lambda_{\text{hi}})\leq\pi_{i}(x;\lambda_{\text{lo}})\quad\text{for all states $x$ and all pairs of prices $\lambda_{\text{hi}}\geq\lambda_{\text{lo}}$}

then an optimal solution to the relaxed problem for price λ\lambda is to activate arm ii if and only if λ<λi(x)\lambda<\lambda_{i}(x). If a restless bandit satisfies this condition, it is said to be indexable. It is important to note that some restless bandits are not indexable, so activating arm ii if and only if λ<λi(x)\lambda<\lambda_{i}(x) does not correspond to an optimal solution to the relaxed problem. Indeed, in a study of small randomly-generated problems, Weber and Weiss [23] found that roughly 10% of problems were not indexable.

As a policy based on λi(x)\lambda_{i}(x) is so good for the relaxed problem when the arms are indexable, this motivates us to use λi(x)\lambda_{i}(x) as a heuristic for the original problem. This heuristic is called Whittle’s index policy and at each time it activates the mm arms with the highest indexes λi(x)\lambda_{i}(x). Further motivation for studying indexability is that for ordinary bandits the Whittle index reduces to the Gittins index, making the Whittle index policy optimal when only one arm may be active at each time, that is when m=1m=1. More generally, Whittle’s index policy is not optimal for some restless bandit problems even when the arms are indexable, but indexability is still a rather useful concept, since if all arms are indexable and certain other conditions hold, Whittle’s policy is asymptotically optimal, as we now explain. Consider a sequence of restless bandit problems parameterised by the number of indexable arms nn and in which m=αnm=\alpha n of the arms can be simultaneously active for some fixed α(0,1)\alpha\in(0,1). Then as nn tends to infinity, the time-average cost per arm for Whittle’s index policy converges to the time-average cost per arm for an optimal policy, provided a certain fluid approximation has a unique fixed point. This result was first demonstrated by Weber and Weiss [23] who for simplicity of exposition only considered the symmetric case in which the nn arms have identical costs and transition probabilities. Recently, Verloop [20] extended this result to asymmetric cases involving multiple types of arms. Interestingly, this extension also covers cases where new arms arrive and old arms depart.

Restless bandits associated with scalar Kalman(-Bucy) filters in continuous time were recently shown to be indexable [12] and the corresponding discrete-time problem has attracted considerable attention over a long period [15, 11, 16, 21]. However, that attention has produced no satisfactory proof of indexability – even for scalar time-series and even if we assume that there is a monotone optimal policy for the single-arm problem, which is a policy that plays the arm if and only if the relevant belief-state exceeds some threshold (here the relevant belief-state is a posterior variance). Theorem 1 of this paper addresses that gap. After formalising the problem (Section 2), we describe the concepts and intuition (Section 3) behind the main result (Section 4). The main tools are mechanical words (which are not sufficiently well-known) and Schur convexity. As these tools are associated with rather general theorems, we believe that future work (Section 5) should enable substantial generalisation of our results.

2 Problem and Index

We consider the problem of tracking NN time-series, which we call arms, in discrete time. The state Zi,tZ_{i,t}\in\mathbb{R} of arm ii at time t+t\in\mathbb{Z}_{+} evolves as a standard-normal random walk independent of everything but its immediate past (+,\mathbb{Z}_{+},\mathbb{R}_{-} and +\mathbb{R}_{+} all include zero). The action space is 𝒰:={1,,N}\mathcal{U}:=\{1,\dots,N\}. Action ut=iu_{t}=i makes an expensive observation Yi,tY_{i,t} of arm ii which is normally-distributed about Zi,tZ_{i,t} with precision bi+b_{i}\in\mathbb{R}_{+} and we receive cheap observations Yj,tY_{j,t} of each other arm jj with precision aj+a_{j}\in\mathbb{R}_{+} where aj<bja_{j}<b_{j} and aj=0a_{j}=0 means no observation at all.

Let Zt,Yt,t,tZ_{t},Y_{t},\mathcal{H}_{t},\mathcal{F}_{t} be the state, observation, history and observed history, so that Zt:=(Z1,t,,ZN,t),Yt:=(Y1,t,,YN,t),t:=((Z0,u0,Y0),,(Zt,ut,Yt))Z_{t}:=(Z_{1,t},\dots,Z_{N,t}),Y_{t}:=(Y_{1,t},\dots,Y_{N,t}),\mathcal{H}_{t}:=((Z_{0},u_{0},Y_{0}),\dots,(Z_{t},u_{t},Y_{t})) and t:=((u0,Y0),,(ut,Yt)).\mathcal{F}_{t}:=((u_{0},Y_{0}),\dots,(u_{t},Y_{t})). Then we formalise the above as (𝟏{\bf 1}_{\cdot} is the indicator function)

Zi,0\displaystyle Z_{i,0} 𝒩(0,1),\displaystyle\sim\mathcal{N}(0,1), Zi,t+1t\displaystyle Z_{i,t+1}\mid\mathcal{H}_{t} 𝒩(Zi,t,1),\displaystyle\sim\mathcal{N}(Z_{i,t},1), Yi,tt1,Zt,ut\displaystyle Y_{i,t}\mid\mathcal{H}_{t-1},Z_{t},u_{t} 𝒩(Zi,t,𝟏utiai+𝟏ut=ibi).\displaystyle\sim\mathcal{N}\left(Z_{i,t},\frac{{\bf 1}_{u_{t}\neq i}}{a_{i}}+\frac{{\bf 1}_{u_{t}=i}}{b_{i}}\right).

Note that this setting is readily generalised to 𝔼[(Zi,t+1Zi,t)2]1\mathbb{E}[(Z_{i,t+1}-Z_{i,t})^{2}]\neq 1 by a change of variables.

Thus the posterior belief is given by the Kalman filter as Zi,tt𝒩(Z^i,t,xi,t)Z_{i,t}\mid\mathcal{F}_{t}\sim\mathcal{N}(\hat{Z}_{i,t},x_{i,t}) where the posterior mean is Z^i,t{\hat{Z}}_{i,t}\in\mathbb{R} and the error variance xi,t+x_{i,t}\in\mathbb{R}_{+} satisfies

xi,t+1=ϕi,𝟏ut+1=i(xi,t)whereϕi,0(x):=x+1aix+ai+1andϕi,1(x):=x+1bix+bi+1.\displaystyle x_{i,t+1}=\phi_{i,{\bf 1}_{u_{t+1}=i}}(x_{i,t})\quad\text{where}\quad\phi_{i,0}(x):=\frac{x+1}{a_{i}x+a_{i}+1}\ \ \text{and}\ \ \phi_{i,1}(x):=\frac{x+1}{b_{i}x+b_{i}+1}. (1)

Problem KF1. Let π\pi be a policy so that ut=π(t1)u_{t}=\pi(\mathcal{F}_{t-1}). Let xi,tπx^{\pi}_{i,t} be the error variance under π\pi. The problem is to choose π\pi so as to minimise the following objective for discount factor β[0,1)\beta\in[0,1). The objective consists of a weighted sum of error variances xi,tπx^{\pi}_{i,t} with weights wi+w_{i}\in\mathbb{R}_{+} plus observation costs hi+h_{i}\in\mathbb{R}_{+} for i=1,,Ni=1,\dots,N:

𝔼[t=0i=1Nβt{hi𝟏ut=i+wixi,tπ}]=t=0i=1Nβt{hi𝟏ut=i+wixi,tπ}\displaystyle\mathbb{E}\left[\sum_{t=0}^{\infty}\sum_{i=1}^{N}\beta^{t}\left\{h_{i}{\bf 1}_{u_{t}=i}+w_{i}x_{i,t}^{\pi}\right\}\right]=\sum_{t=0}^{\infty}\sum_{i=1}^{N}\beta^{t}\left\{h_{i}{\bf 1}_{u_{t}=i}+w_{i}x_{i,t}^{\pi}\right\}

where the equality follows as (1) is a deterministic mapping (and assuming π\pi is deterministic).

Single-Arm Problem and Whittle Index. Now fix an arm ii and write xtπ,ϕ0(),x_{t}^{\pi},\phi_{0}(\cdot),\dots instead of xt,iπ,ϕi,0(),x_{t,i}^{\pi},\phi_{i,0}(\cdot),\dots. Say there are now two actions ut=0,1u_{t}=0,1 corresponding to cheap and expensive observations respectively and the expensive observation now costs h+νh+\nu where ν\nu\in\mathbb{R}. The single-arm problem is to choose a policy, which here is an action sequence, π:=(u0,u1,)\pi:=(u_{0},u_{1},\dots)

so as to minimiseVπ(x|ν):=t=0βt{(h+ν)ut+wxtπ}where x0=x.\displaystyle\text{so as to minimise}\quad V^{\pi}(x|\nu):=\sum_{t=0}^{\infty}\beta^{t}\left\{(h+\nu)u_{t}+wx_{t}^{\pi}\right\}\quad\text{where $x_{0}=x$.} (2)

Let Q(x,α|ν)Q(x,\alpha|\nu) be the optimal cost-to-go in this problem if the first action must be α\alpha and let π\pi^{*} be an optimal policy, so that

Q(x,α|ν):=(h+ν)α+wx+βVπ(ϕα(x)|ν).\displaystyle Q(x,\alpha|\nu):=(h+\nu)\alpha+wx+\beta V^{\pi^{*}}(\phi_{\alpha}(x)|\nu).

For any fixed x+x\in\mathbb{R}_{+}, the value of ν\nu for which actions u0=0u_{0}=0 and u0=1u_{0}=1 are both optimal is known as the Whittle index λW(x)\lambda^{W}(x) assuming it exists and is unique. In other words

The Whittle index λW(x)\lambda^{W}(x) is the solution to Q(x,0|λW(x))=Q(x,1|λW(x)).Q(x,0|\lambda^{W}(x))=Q(x,1|\lambda^{W}(x)). (3)

Let us consider a policy which takes action u0=αu_{0}=\alpha then acts optimally producing actions utα(x)u_{t}^{\alpha*}(x) and error variances xtα(x)x_{t}^{\alpha*}(x). Then (3) gives

t=0βt{(h+λW(x))ut0+wxt0(x)}=t=0βt{(h+λW(x))ut1+wxt1(x)}.\displaystyle\sum_{t=0}^{\infty}\beta^{t}\left\{(h+\lambda^{W}(x))u^{0*}_{t}+wx_{t}^{0*}(x)\right\}=\sum_{t=0}^{\infty}\beta^{t}\left\{(h+\lambda^{W}(x))u^{1*}_{t}+wx_{t}^{1*}(x)\right\}.

Solving this linear equation for the index λW(x)\lambda^{W}(x) gives

λW(x)=wt=1βt(xt0(x)xt1(x))t=0βt(ut1(x)ut0(x))h.\displaystyle\lambda^{W}(x)=w\frac{\sum_{t=1}^{\infty}\beta^{t}(x_{t}^{0*}(x)-x_{t}^{1*}(x))}{\sum_{t=0}^{\infty}\beta^{t}(u^{1*}_{t}(x)-u^{0*}_{t}(x))}-h. (4)

Whittle [24] recognised that for his index policy (play the arm with the largest λW(x)\lambda^{W}(x)) to make sense, any arm which receives an expensive observation for added cost ν\nu, must also receive an expensive observation for added cost ν<ν\nu^{\prime}<\nu. Such problems are said to be indexable. The question resolved by this paper is whether Problem KF1 is indexable. Equivalently, is λW(x)\lambda^{W}(x) non-decreasing in x+x\in\mathbb{R}_{+}?

3 Main Result, Key Concepts and Intuition

We make the following intuitive assumption about threshold (monotone) policies.
A1. For some x+x\in\mathbb{R}_{+} depending on ν\nu\in\mathbb{R}, the policy ut=𝟏xtxu_{t}={\bf 1}_{x_{t}\geq x} is optimal for problem (2).

Note that under A1, definition (3) means the policy ut=𝟏xt>xu_{t}={\bf 1}_{x_{t}>x} is also optimal, so we can choose

ut0(x):={0if xt10(x)x1otherwiseandxt0(x):={ϕ0(xt10(x))if xt10(x)xϕ1(xt10(x))otherwiseut1(x):={0if xt11(x)<x1otherwiseandxt1(x):={ϕ0(xt11(x))if xt11(x)<xϕ1(xt11(x))otherwise}\displaystyle\left.\begin{aligned} u_{t}^{0*}(x)&:=\begin{cases}0&\text{if $x_{t-1}^{0*}(x)\leq x$}\\ 1&\text{otherwise}\end{cases}&\quad\text{and}\quad x_{t}^{0*}(x)&:=\begin{cases}\phi_{0}(x_{t-1}^{0*}(x))&\text{if $x_{t-1}^{0*}(x)\leq x$}\\ \phi_{1}(x_{t-1}^{0*}(x))&\text{otherwise}\end{cases}\\ u_{t}^{1*}(x)&:=\begin{cases}0&\text{if $x_{t-1}^{1*}(x)<x$}\\ 1&\text{otherwise}\end{cases}&\quad\text{and}\quad x_{t}^{1*}(x)&:=\begin{cases}\phi_{0}(x_{t-1}^{1*}(x))&\text{if $x_{t-1}^{1*}(x)<x$}\\ \phi_{1}(x_{t-1}^{1*}(x))&\text{otherwise}\end{cases}\end{aligned}\quad\right\} (5)

where x00(x)=x01(x)=xx_{0}^{0*}(x)=x_{0}^{1*}(x)=x. We refer to xt0(x),xt1(x)x_{t}^{0*}(x),x_{t}^{1*}(x) as the xx-threshold orbits (Figure 1).

We are now ready to state our main result.

Theorem 1. Suppose a threshold policy (A1) is optimal for the single-arm problem (2). Then Problem KF1 is indexable. Specifically, for any b>a0b>a\geq 0 let

ϕ0(x)\displaystyle\phi_{0}(x) :=x+1ax+a+1,\displaystyle:=\frac{x+1}{ax+a+1}, ϕ1(x)\displaystyle\phi_{1}(x) :=x+1bx+b+1\displaystyle:=\frac{x+1}{bx+b+1}

and for any w+,hw\in\mathbb{R}_{+},h\in\mathbb{R} and 0<β<10<\beta<1, let

λW(x):=wt=1βt(xt0(x)xt1(x))t=0βt(ut1(x)ut0(x))h\displaystyle\lambda^{W}(x):=w\frac{\sum_{t=1}^{\infty}\beta^{t}(x_{t}^{0*}(x)-x_{t}^{1*}(x))}{\sum_{t=0}^{\infty}\beta^{t}(u^{1*}_{t}(x)-u^{0*}_{t}(x))}-h (6)

in which action sequences ut0(x),ut1(x)u_{t}^{0*}(x),u_{t}^{1*}(x) and error variance sequences xt0(x),xt1(x)x_{t}^{0*}(x),x_{t}^{1*}(x) are given in terms of ϕ0,ϕ1\phi_{0},\phi_{1} by (5). Then λW(x)\lambda^{W}(x) is a continuous and non-decreasing function of x+x\in\mathbb{R}_{+}.

Refer to caption
Figure 1: Orbit xt0(x)x^{0*}_{t}(x) traces the path ABCDEABCDE\dots for the word 01w=0110101w=01101. Orbit xt1(x)x^{1*}_{t}(x) traces the path FGHIJFGHIJ\dots for the word 10w=1010110w=10101. Word w=101w=101 is a palindrome.

We are now ready to describe the key concepts underlying this result.

Words. In this paper, a word ww is a string on {0,1}\{0,1\}^{*} with kthk^{\rm th} letter wkw_{k} and wi:j:=wiwi+1wjw_{i:j}:=w_{i}w_{i+1}\dots w_{j}. The empty word is ϵ\epsilon, the concatenation of words u,vu,v is uvuv, the word that is the nn-fold repetition of ww is wnw^{n}, the infinite repetition of ww is wωw^{\omega} and w~\tilde{w} is the reverse of ww, so w=w~w=\tilde{w} means ww is a palindrome. The length of ww is |w|{\left|w\right|} and |w|u{\left|w\right|}_{u} is the number of times that word uu appears in ww, overlaps included.

Christoffel, Sturmian and Mechanical Words. It turns out that the action sequences in (5) are given by such words, so the following definitions are central to this paper.

The Christoffel tree (Figure 2) is an infinite complete binary tree [5] in which each node is labelled with a pair (u,v)(u,v) of words. The root is (0,1)(0,1) and the children of (u,v)(u,v) are (u,uv)(u,uv) and (uv,v)(uv,v). The Christoffel words are the words 0,10,1 and the concatenations uvuv for all (u,v)(u,v) in that tree. The fractions |uv|1/|uv|0{\left|uv\right|}_{1}/{\left|uv\right|}_{0} form the Stern-Brocot tree [9] which contains each positive rational number exactly once. Also, infinite paths in the Stern-Brocot tree converge to the positive irrational numbers. Analogously, Sturmian words could be thought of as infinitely-long Christoffel words.

Alternatively, among many known characterisations, the Christoffel words can be defined as the words 0,10,1 and the words 0w10w1 where a:=|0w1|1/|0w1|a:={\left|0w1\right|}_{1}/{\left|0w1\right|} and

(01w)n:=(n+1)ana\displaystyle(01w)_{n}:=\lfloor(n+1)a\rfloor-\lfloor na\rfloor

for any relatively prime natural numbers |0w1|0{\left|0w1\right|}_{0} and |0w1|1{\left|0w1\right|}_{1} and for n=1,2,,|0w1|n=1,2,\dots,{\left|0w1\right|}. The Sturmian words are then the infinite words 0w1w20w_{1}w_{2}\cdots where, for n=1,2,n=1,2,\dots and a(0,1)\a\in(0,1)\backslash\mathbb{Q},

(01w1w2)n:=(n+1)ana.\displaystyle(01w_{1}w_{2}\cdots)_{n}:=\lfloor(n+1)a\rfloor-\lfloor na\rfloor.

We use the notation 0w10w1 for Sturmian words although they are infinite.

Refer to caption
Figure 2: Part of the Christoffel tree.

The set of mechanical words is the union of the Christoffel and Sturmian words [13]. (Note that the mechanical words are sometimes defined in terms of infinite repetitions of the Christoffel words.)

Majorisation. As in [14], let x,ymx,y\in\mathbb{R}^{m} and let x(i)x_{(i)} and y(i)y_{(i)} be their elements sorted in ascending order. We say xx is weakly supermajorised by yy and write xwyx\prec^{w}y if

k=1jx(k)k=1jy(k)for all j=1,,m.\displaystyle\sum_{k=1}^{j}x_{(k)}\geq\sum_{k=1}^{j}y_{(k)}\qquad\text{for all $j=1,\dots,m$.}

If this is an equality for j=mj=m we say xx is majorised by yy and write xyx\prec y. It turns out that

xy\displaystyle x\prec y\qquad k=1jx[k]k=1jy[k]for j=1,,m1 with equality for j=m\displaystyle\Leftrightarrow\qquad\sum_{k=1}^{j}x_{[k]}\leq\sum_{k=1}^{j}y_{[k]}\quad\text{for $j=1,\dots,m-1$ with equality for $j=m$}
where x[k],y[k]x_{[k]},y_{[k]} are the sequences sorted in descending order. For x,ymx,y\in\mathbb{R}^{m} we have [14]
xy\displaystyle x\prec y\qquad i=1mf(xi)i=1mf(yi)for all convex functions f:.\displaystyle\Leftrightarrow\qquad\sum_{i=1}^{m}f(x_{i})\leq\sum_{i=1}^{m}f(y_{i})\quad\text{for all convex functions $f:\mathbb{R}\rightarrow\mathbb{R}$.}

More generally, a real-valued function ϕ\phi defined on a subset 𝒜\mathcal{A} of m\mathbb{R}^{m} is said to be Schur-convex on 𝒜\mathcal{A} if xyx\prec y implies that ϕ(x)ϕ(y)\phi(x)\leq\phi(y).

Möbius Transformations. Let μA(x)\mu_{A}(x) denote the Möbius transformation μA(x):=A11x+A12A21x+A22\mu_{A}(x):=\frac{A_{11}x+A_{12}}{A_{21}x+A_{22}} where A2×2A\in\mathbb{R}^{2\times 2}. Möbius transformations such as ϕ0(),ϕ1()\phi_{0}(\cdot),\phi_{1}(\cdot) are closed under composition, so for any word ww we define ϕw(x):=ϕw|w|ϕw2ϕw1(x)\phi_{w}(x):=\phi_{w_{{\left|w\right|}}}\circ\dots\circ\phi_{w_{2}}\circ\phi_{w_{1}}(x) and ϕϵ(x):=x.\phi_{\epsilon}(x):=x.

Intuition. Here is the intuition behind our main result.

For any x+x\in\mathbb{R}_{+}, the orbits in (5) correspond to a particular mechanical word 0,10,1 or 0w10w1 depending on the value of xx (Figure 1). Specifically, for any word uu, let yuy_{u} be the fixed point of the mapping ϕu\phi_{u} on +\mathbb{R}_{+} so that ϕu(yu)=yu\phi_{u}(y_{u})=y_{u} and yu+y_{u}\in\mathbb{R}_{+}. Then the word corresponding to xx is 1 for 0xy10\leq x\leq y_{1}, 0w10w1 for x[y01w,y10w]x\in[y_{01w},y_{10w}] and 0 for y0x<y_{0}\leq x<\infty. In passing we note that these fixed points are sorted in ascending order by the ratio ρ:=|01w|0/|01w|1\rho:={\left|01w\right|}_{0}/{\left|01w\right|}_{1} of counts of 0s to counts of 1s, as illustrated by Figure 3. Interestingly, it turns out that ratio ρ\rho is a piecewise-constant yet continuous function of xx, reminiscent of the Cantor function.

Also, composition of Möbius transformations is homeomorphic to matrix multiplication so that

μAμB(x)=μAB(x)for any A,B2×2.\displaystyle\mu_{A}\circ\mu_{B}(x)=\mu_{AB}(x)\qquad\text{for any $A,B\in\mathbb{R}^{2\times 2}.$}

Thus, the index (6) can be written in terms of the orbits of a linear system (11) given by 0,10,1 or 0w1.0w1. Further, if A2×2A\in\mathbb{R}^{2\times 2} and det(A)=1\det(A)=1 then the gradient of the corresponding Möbius transformation is the convex function

dμA(x)dx=1(A21x+A22)2.\displaystyle\frac{d\mu_{A}(x)}{dx}=\frac{1}{(A_{21}x+A_{22})^{2}}.

So the gradient of the index is the difference of the sums of a convex function of the linear-system orbits. However, such sums are Schur-convex functions and it follows that the index is increasing because one orbit weakly supermajorises the other, as we now show for the case 0w10w1 (noting that the proof is easier for words 0,10,1). As 0w10w1 is a mechanical word, ww is a palindrome. Further, if ww is a palindrome, it turns out that the difference between the linear-system orbits increases with xx. So, we might define the majorisation point for ww as the xx for which one orbit majorises the other. Quite remarkably, if ww is a palindrome then the majorisation point is ϕw(0)\phi_{w}(0) (Proposition 7). Indeed the black circles and blue dots of Figure 3 coincide. Finally, ϕw(0)\phi_{w}(0) is less than or equal to y01wy_{01w} which is the least xx for which the orbits correspond to the word 0w10w1. Indeed, the blue dots of Figure 3 are below the corresponding black dots. Thus one orbit does indeed supermajorise the other.

Refer to caption
Figure 3: Lower fixed points y01wy_{01w} of Christoffel words (black dots), majorisation points for those words (black circles) and the tree of ϕw(0)\phi_{w}(0) (blue).

4 Proof of Main Result

4.1 Mechanical Words

The Möbius transformations of (1) satisfy the following assumption for :=+\mathcal{I}:=\mathbb{R}_{+}. We prove that the fixed point ywy_{w} of word ww (the solution to ϕw(x)=x\phi_{w}(x)=x on \mathcal{I}) is unique in the supplementary material.

Assumption A2. Functions ϕ0:,ϕ1:\phi_{0}:\mathcal{I}\rightarrow\mathcal{I},\phi_{1}:\mathcal{I}\rightarrow\mathcal{I}, where \mathcal{I} is an interval of \mathbb{R}, are increasing and non-expansive, so for all x,y:x<yx,y\in\mathcal{I}:x<y and for k{0,1}k\in\{0,1\} we have

ϕk(x)<ϕk(y)increasingandϕk(y)ϕk(x)<yxnon-expansive.\displaystyle\underbrace{\phi_{k}(x)<\phi_{k}(y)}_{\text{increasing}}\qquad\qquad\text{and}\qquad\qquad\underbrace{\phi_{k}(y)-\phi_{k}(x)<y-x}_{\text{non-expansive}}.

Furthermore, the fixed points y0,y1y_{0},y_{1} of ϕ0,ϕ1\phi_{0},\phi_{1} on \mathcal{I} satisfy y1<y0y_{1}<y_{0}.

Hence the following two propositions (supplementary material) apply to ϕ0,ϕ1\phi_{0},\phi_{1} of (1) on =+\mathcal{I}=\mathbb{R}_{+}.

Proposition 1.

Suppose A2 holds, xx\in\mathcal{I} and ww is a non-empty word. Then

x<ϕw(x)ϕw(x)<ywx<yw\displaystyle x<\phi_{w}(x)\ \Leftrightarrow\ \phi_{w}(x)<y_{w}\ \Leftrightarrow\ x<y_{w} and x>ϕw(x)ϕw(x)>ywx>yw.\displaystyle x>\phi_{w}(x)\ \Leftrightarrow\ \phi_{w}(x)>y_{w}\ \Leftrightarrow\ x>y_{w}.

For a given xx, in the notation of (5), we call the shortest word uu such that (u11,u21,)=uω(u^{1*}_{1},u^{1*}_{2},\dots)=u^{\omega} the xx-threshold word. Proposition 2 generalises a recent result about xx-threshold words in a setting where ϕ0,ϕ1\phi_{0},\phi_{1} are linear [18].

Proposition 2.

Suppose A2 holds and 0w10w1 is a mechanical word. Then

0w1 is the x-threshold wordx[y01w,y10w].\displaystyle\text{$0w1$ is the $x$-threshold word}\ \Leftrightarrow\ x\in[y_{01w},y_{10w}].

Also, if x0,x1x_{0},x_{1}\in\mathcal{I} with x0y0x_{0}\geq y_{0} and x1y1x_{1}\leq y_{1} then the x0x_{0}- and x1x_{1}-threshold words are 0 and 11.

We also use the following very interesting fact (Proposition 4.2 on p.28 of [5]).

Proposition 3.

Suppose 0w10w1 is a mechanical word. Then ww is a palindrome.

4.2 Properties of the Linear-System Orbits M(w)M(w) and Prefix Sums S(w)S(w)

Definition. Assume that a,b+a,b\in\mathbb{R}_{+} and a<ba<b. Consider the matrices

F:=(11a1+a),G:=(11b1+b)andK:=(1101)\displaystyle F:=\begin{pmatrix}1&1\\ a&1+a\end{pmatrix},\qquad G:=\begin{pmatrix}1&1\\ b&1+b\end{pmatrix}\quad\text{and}\quad K:=\begin{pmatrix}-1&-1\\ 0&1\end{pmatrix}

so that the Möbius transformations μF,μG\mu_{F},\mu_{G} are the functions ϕ0,ϕ1\phi_{0},\phi_{1} of (1) and GFFG=(ba)KGF-FG=(b-a)K. Given any word w{0,1}w\in\{0,1\}^{*}, we define the matrix product M(w)M(w)

M(w):=M(w|w|)M(w1),where M(ϵ):=I,M(0):=F and M(1):=G\displaystyle M(w):=M(w_{{\left|w\right|}})\cdots M(w_{1}),\quad\text{where $M(\epsilon):=I,M(0):=F$ and $M(1):=G$}

where I2×2I\in\mathbb{R}^{2\times 2} is the identity and the prefix sum S(w)S(w) as the matrix polynomial

S(w):=k=1|w|M(w1:k),where S(ϵ):=0 (the all-zero matrix).\displaystyle S(w):=\sum_{k=1}^{{\left|w\right|}}M(w_{1:k}),\qquad\text{where $S(\epsilon):=0$ (the all-zero matrix).} (7)

For any A2×2A\in\mathbb{R}^{2\times 2}, let tr(A)\text{tr}(A) be the trace of AA, let Aij=[A]ijA_{ij}=[A]_{ij} be the entries of AA and let A0A\geq 0 indicate that all entries of AA are non-negative.

Remark. Clearly, det(F)=det(G)=1\det(F)=\det(G)=1 so that det(M(w))=1\det(M(w))=1 for any word ww. Also, S(w)S(w) corresponds to the partial sums of the linear-system orbits, as hinted in the previous section.

The following proposition captures the role of palindromes (proof in the supplementary material).

Proposition 4.

Suppose ww is a word, pp is a palindrome and n+n\in\mathbb{Z}_{+}. Then

  1. 1.

    M(p)=(fh+1h+ffh21h+fh)M(p)=\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix} for some f,hf,h\in\mathbb{R},

  2. 2.

    tr(M(10p))=tr(M(01p))\text{tr}(M(10p))=\text{tr}(M(01p)),

  3. 3.

    If u{p(10p)n,(10p)n10}u\in\{p(10p)^{n},(10p)^{n}10\} then M(u)M(u~)=λKM(u)-M(\tilde{u})=\lambda K for some λ\lambda\in\mathbb{R}_{-},

  4. 4.

    If ww is a prefix of pp then [M(p(10p)n10w)]22[M(p(01p)n01w)]22[M(p(10p)^{n}10w)]_{22}\leq[M(p(01p)^{n}01w)]_{22},

  5. 5.

    [M((10p)n10w)]21[M((01p)n01w)]21[M((10p)^{n}10w)]_{21}\geq[M((01p)^{n}01w)]_{21},

  6. 6.

    [M((10p)n1)]21[M((01p)n0)]21[M((10p)^{n}1)]_{21}\geq[M((01p)^{n}0)]_{21}.

We now demonstrate a surprisingly simple relation between S(w)S(w) and M(w)M(w).

Proposition 5.

Suppose ww is a palindrome. Then

S21(w)=M22(w)1andS22(w)=M12(w)+S21(w).\displaystyle S_{21}(w)=M_{22}(w)-1\qquad\text{and}\qquad S_{22}(w)=M_{12}(w)+S_{21}(w). (8)

Furthermore, if Δk:=[S(10w)M(w(10w)k)S(01w)M(w(01w)k)]22\Delta_{k}:=[S(10w)M(w(10w)^{k})-S(01w)M(w(01w)^{k})]_{22} then

Δk=0for all k+.\displaystyle\Delta_{k}=0\qquad\text{for all $k\in\mathbb{Z}_{+}$.} (9)
Proof.

Let us write M:=M(w),S:=S(w)M:=M(w),S:=S(w). We prove (8) by induction on |w|{\left|w\right|}. In the base case w{ϵ,0,1}w\in\{\epsilon,0,1\}. For w=ϵw=\epsilon, M221=0=S21,M12+S21=0=S22.M_{22}-1=0=S_{21},M_{12}+S_{21}=0=S_{22}. For w{0,1}w\in\{0,1\}, M221=c=S21,M12+S21=1+c=S22M_{22}-1=c=S_{21},M_{12}+S_{21}=1+c=S_{22} for some c{a,b}c\in\{a,b\}. For the inductive step, in accordance with Claim 1 of Proposition 19, assume w{0v0,1v1}w\in\{0v0,1v1\} for some word vv satisfying

M(v)\displaystyle M(v) =(fh+1h+ffh21h+fh),\displaystyle=\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix}, S(v)\displaystyle S(v) =(cdh1f+h1)for some c,d,f,h.\displaystyle=\begin{pmatrix}c&d\\ h-1&f+h-1\end{pmatrix}\quad\text{for some $c,d,f,h\in\mathbb{R}$.}

For w=1v1w=1v1, M:=M(1v1)=GM(v)GM:=M(1v1)=GM(v)G and S:=S(1v1)=GM(v)G+S(v)G+GS:=S(1v1)=GM(v)G+S(v)G+G. Calculating the corresponding matrix products and sums gives

S21\displaystyle S_{21} =(bh+h+bf1)(bh+2h+bf+f+1)(h+f)1=M221\displaystyle=(bh+h+bf-1)(bh+2h+bf+f+1)(h+f)^{-1}=M_{22}-1
S22S21\displaystyle S_{22}-S_{21} =bh+2h+bf+f=M12\displaystyle=bh+2h+bf+f=M_{12}

as claimed. For w=0u0w=0u0 the claim also holds as F=G|b=aF=\left.G\right|_{b=a}. This completes the proof of (8).

Furthermore Part. Let A:=S(w)FG+FG+GA:=S(w)FG+FG+G and B:=S(w)GF+GF+FB:=S(w)GF+GF+F. Then

Δk=[(A(M(w)FG)kB(M(w)GF)k)M(w)]22\displaystyle\Delta_{k}=[(A(M(w)FG)^{k}-B(M(w)GF)^{k})M(w)]_{22} (10)

by definition of S()S(\cdot). By Claim 1 of Proposition 19 and (8) we know that

M(w)\displaystyle M(w) =(fh+1h+ffh21h+fh),\displaystyle=\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix}, S(w)\displaystyle S(w) =(cdh1f+h1)for some c,d,f,h.\displaystyle=\begin{pmatrix}c&d\\ h-1&f+h-1\end{pmatrix}\quad\text{for some $c,d,f,h\in\mathbb{R}$.}

Substituting these expressions and the definitions of F,GF,G into the definitions of A,BA,B and then into (10) for k{0,1}k\in\{0,1\} directly gives Δ0=Δ1=0\Delta_{0}=\Delta_{1}=0 (although this calculation is long).

Now consider the case k2k\geq 2. Claim 2 of Proposition 19 says tr(M(10w))=tr(M(01w))\text{tr}(M(10w))=\text{tr}(M(01w)) and clearly det(M(10w))=det(M(01w))=1\det(M(10w))=\det(M(01w))=1. Thus we can diagonalise as

M(w)FG\displaystyle M(w)FG =:UDU1,\displaystyle=:UDU^{-1}, M(w)GF\displaystyle M(w)GF =:VDV1,\displaystyle=:VDV^{-1}, D\displaystyle D :=diag(λ,1/λ)for some λ1\displaystyle:=\text{diag}(\lambda,1/\lambda)\quad\text{for some $\lambda\geq 1$}

so that Δk=[AUDkU1M(w)eTBVDkV1M(w)]22=:γ1λk+γ2λk.\Delta_{k}=[AUD^{k}U^{-1}M(w)-e^{T}BVD^{k}V^{-1}M(w)]_{22}=:\gamma_{1}\lambda^{k}+\gamma_{2}\lambda^{-k}. So, if λ=1\lambda=1 then Δk=γ1+γ2=Δ0\Delta_{k}=\gamma_{1}+\gamma_{2}=\Delta_{0} and we already showed that Δ0=0\Delta_{0}=0. Otherwise λ1\lambda\neq 1, so Δ0=Δ1=0\Delta_{0}=\Delta_{1}=0 implies γ1+γ2=γ1λ+γ2λ1=0\gamma_{1}+\gamma_{2}=\gamma_{1}\lambda+\gamma_{2}\lambda^{-1}=0 which gives γ1=γ2=0\gamma_{1}=\gamma_{2}=0. Thus for any k+k\in\mathbb{Z}_{+} we have Δk=γ1λk+γ2λk=0\Delta_{k}=\gamma_{1}\lambda^{k}+\gamma_{2}\lambda^{-k}=0. ∎

4.3 Majorisation

The following is a straightforward consequence of results in [14] proved in the supplementary material. We emphasize that the notation w\prec^{w} has nothing to do with the notion of ww as a word.

Proposition 6.

Suppose x,y+mx,y\in\mathbb{R}_{+}^{m} and f:f:\mathbb{R}\rightarrow\mathbb{R} is a symmetric function that is convex and decreasing on +\mathbb{R}_{+}. Then xwy and β[0,1]i=1mβif(x(i))i=1mβif(y(i))\text{$x\prec^{w}y$ and $\beta\in[0,1]$}\quad\Rightarrow\quad\sum_{i=1}^{m}\beta^{i}f(x_{(i)})\geq\sum_{i=1}^{m}\beta^{i}f(y_{(i)}).

For any xx\in\mathbb{R} and any fixed word ww, define the sequences for n+n\in\mathbb{Z}_{+} and k=1,,mk=1,\dots,m

xnm+k(x):=[M((10w)n(10w)1:k)v(x)]2,σx(n):=(xnm+1(x),,xnm+m(x))ynm+k(x):=[M((01w)n(01w)1:k)v(x)]2,σy(n):=(ynm+1(x),,ynm+m(x))}\displaystyle\left.\begin{aligned} x_{nm+k}(x)&:=[M((10w)^{n}(10w)_{1:k})v(x)]_{2},&\sigma_{x}^{(n)}:=(x_{nm+1}(x),\dots,x_{nm+m}(x))\\ y_{nm+k}(x)&:=[M((01w)^{n}(01w)_{1:k})v(x)]_{2},&\sigma_{y}^{(n)}:=(y_{nm+1}(x),\dots,y_{nm+m}(x))\end{aligned}\right\} (11)

where m:=|10w|m:={\left|10w\right|} and v(x):=(x,1)T.v(x):=(x,1)^{T}.

Proposition 7.

Suppose ww is a palindrome and xϕw(0)x\geq\phi_{w}(0). Then σx(n)\sigma_{x}^{(n)} and σy(n)\sigma_{y}^{(n)} are ascending sequences on +\mathbb{R}_{+} and σx(n)wσy(n)\sigma_{x}^{(n)}\prec^{w}\sigma_{y}^{(n)} for any n+n\in\mathbb{Z}_{+}.

Proof.

Clearly ϕw(0)0\phi_{w}(0)\geq 0 so x0x\geq 0 and hence v(x)0v(x)\geq 0. So for any word uu and letter c{0,1}c\in\{0,1\} we have M(uc)v(x)=M(c)M(u)v(x)M(u)v(x)0M(uc)v(x)=M(c)M(u)v(x)\geq M(u)v(x)\geq 0 as M(c)IM(c)\geq I. Thus xk+1(x)xk(x)0x_{k+1}(x)\geq x_{k}(x)\geq 0 and yk+1(x)yk(x)0y_{k+1}(x)\geq y_{k}(x)\geq 0. In conclusion, σx(n)\sigma_{x}^{(n)} and σy(n)\sigma_{y}^{(n)} are ascending sequences on +\mathbb{R}_{+}.

Now ϕw(0)=[M(w)]12[M(w)]22\phi_{w}(0)=\frac{[M(w)]_{12}}{[M(w)]_{22}}. Thus [Av(ϕw(0))]2:=[AM(w)]22[M(w)]22[Av(\phi_{w}(0))]_{2}:=\frac{[AM(w)]_{22}}{[M(w)]_{22}} for any A2×2A\in\mathbb{R}^{2\times 2}. So

xnm+k(ϕw(0))ynm+k(ϕw(0))\displaystyle x_{nm+k}(\phi_{w}(0))-y_{nm+k}(\phi_{w}(0))
=1[M(w)]22[(M((10w)n(10w)1:k)M((01w)n(01w)1:k))M(w)]220\displaystyle\quad=\frac{1}{[M(w)]_{22}}\left[(M((10w)^{n}(10w)_{1:k})-M((01w)^{n}(01w)_{1:k}))M(w)\right]_{22}\leq 0

for k=2,,mk=2,\dots,m by Claim 4 of Proposition 19. So all but the first term of the sum Tm(ϕw(0))T_{m}(\phi_{w}(0)) is non-positive where

Tj(x):=k=1j(xnm+k(x)ynm+k(x)).\displaystyle T_{j}(x):=\sum_{k=1}^{j}(x_{nm+k}(x)-y_{nm+k}(x)).

Thus T1(ϕw(0))T2(ϕw(0))Tm(ϕw(0))T_{1}(\phi_{w}(0))\geq T_{2}(\phi_{w}(0))\geq\dots T_{m}(\phi_{w}(0)). But

Tm(ϕw(0))\displaystyle T_{m}(\phi_{w}(0)) =1[M(w)]22k=1m[(M((10w)n(10w)1:k)M((01w)n(01w)1:k))M(w)]22\displaystyle=\frac{1}{[M(w)]_{22}}\sum_{k=1}^{m}\left[(M((10w)^{n}(10w)_{1:k})-M((01w)^{n}(01w)_{1:k}))M(w)\right]_{22}
=1[M(w)]22[S(10w)M(w(10w)n)S(01w)M(w(01w)n)]22=0\displaystyle=\frac{1}{[M(w)]_{22}}\left[S(10w)M(w(10w)^{n})-S(01w)M(w(01w)^{n})\right]_{22}=0

where the last step follows from (9). So Tj(ϕw(0))0T_{j}(\phi_{w}(0))\geq 0 for j=1,,mj=1,\dots,m. Yet Claims 5 and 6 of Proposition 19 give ddxTj(x)=k=1j[M((10w)n(10w)1:k)M((01w)n(01w)1:k)]210.\frac{d}{dx}T_{j}(x)=\sum_{k=1}^{j}[M((10w)^{n}(10w)_{1:k})-M((01w)^{n}(01w)_{1:k})]_{21}\geq 0. So for xϕw(0)x\geq\phi_{w}(0) we have Tj(x)0T_{j}(x)\geq 0 for j=1,,mj=1,\dots,m which means that σx(n)wσy(n)\sigma_{x}^{(n)}\prec^{w}\sigma_{y}^{(n)}. ∎

4.4 Indexability

Theorem 1.

The index λW(x)\lambda^{W}(x) of (6) is continuous and non-decreasing for x+x\in\mathbb{R}_{+}.

Proof.

As weight ww is non-negative and cost hh is a constant we only need to prove the result for λ(x):=λW(x)|w=1,h=0\lambda(x):=\left.\lambda^{W}(x)\right|_{w=1,h=0} and we can use ww to denote a word. By Proposition 2, x[y01w,y10w]x\in[y_{01w},y_{10w}] for some mechanical word 0w10w1. (Cases x(y1,y0)x\notin(y_{1},y_{0}) are clarified in the supplementary material.)

Let us show that the hypotheses of Proposition 7 are satisfied by ww and xx. Firstly, ww is a palindrome by Proposition 3. Secondly, ϕw01(0)0\phi_{w01}(0)\geq 0 and as ϕw()\phi_{w}(\cdot) is monotonically increasing, it follows that ϕwϕw01(0)ϕw(0)\phi_{w}\circ\phi_{w01}(0)\geq\phi_{w}(0). Equivalently, ϕ01wϕw(0)ϕw(0)\phi_{01w}\circ\phi_{w}(0)\geq\phi_{w}(0) so that ϕw(0)y01w\phi_{w}(0)\leq y_{01w} by Proposition 1. Hence xy01wϕw(0)x\geq y_{01w}\geq\phi_{w}(0).

Thus Proposition 7 applies, showing that the sequences σx(n)\sigma_{x}^{(n)} and σy(n)\sigma_{y}^{(n)}, with elements xnm+k(x)x_{nm+k}(x) and ynm+k(x)y_{nm+k}(x) as defined in (11), are non-decreasing sequences on +\mathbb{R}_{+} with σx(n)wσy(n)\sigma_{x}^{(n)}\prec^{w}\sigma_{y}^{(n)}. Also, 1/x21/x^{2} is a symmetric function that is convex and decreasing on +\mathbb{R}_{+}. Therefore Proposition 6 applies giving

k=1m(βnm+k1(xnm+k(x))2βnm+k1(ynm+k(x))2)0for any n+ where m:=|01w|.\displaystyle\sum_{k=1}^{m}\left(\frac{\beta^{nm+k-1}}{(x_{nm+k}(x))^{2}}-\frac{\beta^{nm+k-1}}{(y_{nm+k}(x))^{2}}\right)\geq 0\qquad\text{for any $n\in\mathbb{Z}_{+}$ where $m:={\left|01w\right|}$.} (12)

Also Proposition 2 shows that the xx-threshold orbits are (ϕu1(x),,ϕu1:k(x),)(\phi_{u_{1}}(x),\dots,\phi_{u_{1:k}}(x),\dots) and (ϕl1(x),,ϕl1:k(x),)(\phi_{l_{1}}(x),\dots,\phi_{l_{1:k}}(x),\dots) where u:=(01w)ωu:=(01w)^{\omega} and l:=(10w)ωl:=(10w)^{\omega}. So the denominator of (6) is

k=0βk(𝟏lk+1=1𝟏uk+1=1)=k=0βmk(1β)λ(x)=1βm1βk=1βk1(ϕu1:k(x)ϕl1:k(x)).\displaystyle\sum_{k=0}^{\infty}\beta^{k}({\bf 1}_{l_{k+1}=1}-{\bf 1}_{u_{k+1}=1})=\sum_{k=0}^{\infty}\beta^{mk}(1-\beta)\Rightarrow\lambda(x)=\frac{1-\beta^{m}}{1-\beta}\sum_{k=1}^{\infty}\beta^{k-1}(\phi_{u_{1:k}}(x)-\phi_{l_{1:k}}(x)).

Note that ddxex+fgx+h=1(gx+h)2\frac{d}{dx}\frac{ex+f}{gx+h}=\frac{1}{(gx+h)^{2}} for any ehfg=1eh-fg=1. Then (12) gives

dλ(x)dx=1βm1βn=0k=1m(βnm+k1(xnm+k(x))2βnm+k1(ynm+k(x))2)0.\displaystyle\frac{d\lambda(x)}{dx}=\frac{1-\beta^{m}}{1-\beta}\sum_{n=0}^{\infty}\sum_{k=1}^{m}\left(\frac{\beta^{nm+k-1}}{(x_{nm+k}(x))^{2}}-\frac{\beta^{nm+k-1}}{(y_{nm+k}(x))^{2}}\right)\geq 0.

But λ(x)\lambda(x) is continuous for x+x\in\mathbb{R}_{+} (as shown in the supplementary material). Therefore we conclude that λ(x)\lambda(x) is non-decreasing for x+x\in\mathbb{R}_{+}. ∎

5 Further Work

One might attempt to prove that assumption A1 holds using general results about monotone optimal policies for two-action MDPs based on submodularity [2] or multimodularity [1]. However, we find counter-examples to the required submodularity condition. Rather, we are optimistic that the ideas of this paper themselves offer an alternative approach to proving A1. It would then be natural to extend our results to settings where the underlying state evolves as Zt+1t𝒩(mZt,1)Z_{t+1}\mid\mathcal{H}_{t}\sim\mathcal{N}(mZ_{t},1) for some multiplier m1m\neq 1 and to cost functions other than the variance. Finally, the question of the indexability of the discrete-time Kalman filter in multiple dimensions remains open.

References

  • [1] E. Altman, B. Gaujal, and A. Hordijk. Multimodularity, convexity, and optimization properties. Mathematics of Operations Research, 25(2):324–347, 2000.
  • [2] E. Altman and S. Stidham Jr. Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Systems, 21(3-4):267–291, 1995.
  • [3] M. Araya, O. Buffet, V. Thomas, and F. Charpillet. A POMDP extension with belief-dependent rewards. In Neural Information Processing Systems, pages 64–72, 2010.
  • [4] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 671–680, 2014.
  • [5] J. Berstel, A. Lauve, C. Reutenauer, and F. Saliola. Combinatorics on Words: Christoffel Words and Repetitions in Words. CRM Monograph Series, 2008.
  • [6] S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Foundation and Trends in Machine Learning, Vol. 5. NOW, 2012.
  • [7] Y. Chen, H. Shioi, C. Montesinos, L. P. Koh, S. Wich, and A. Krause. Active detection via adaptive submodularity. In Proceedings of The 31st International Conference on Machine Learning, pages 55–63, 2014.
  • [8] J. Gittins, K. Glazebrook, and R. Weber. Multi-armed bandit allocation indices. John Wiley & Sons, 2011.
  • [9] R. Graham, D. Knuth, and O. Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley, 1994.
  • [10] S. Guha, K. Munagala, and P. Shi. Approximation algorithms for restless bandit problems. Journal of the ACM, 58(1):3, 2010.
  • [11] B. La Scala and B. Moran. Optimal target tracking with restless bandits. Digital Signal Processing, 16(5):479–487, 2006.
  • [12] J. Le Ny, E. Feron, and M. Dahleh. Scheduling continuous-time Kalman filters. IEEE Trans. Automatic Control, 56(6):1381–1394, 2011.
  • [13] M. Lothaire. Algebraic combinatorics on words. Cambridge University Press, 2002.
  • [14] A. Marshall, I. Olkin, and B. Arnold. Inequalities: Theory of majorization and its applications. Springer Science & Business Media, 2010.
  • [15] L. Meier, J. Peschon, and R. Dressler. Optimal control of measurement subsystems. IEEE Trans. Automatic Control, 12(5):528–536, 1967.
  • [16] J. Niño-Mora and S. Villar. Multitarget tracking via restless bandit marginal productivity indices and Kalman filter in discrete time. In Proceedings of the 48th IEEE Conference on Decision and Control, pages 2905–2910, 2009.
  • [17] R. Ortner, D. Ryabko, P. Auer, and R. Munos. Regret bounds for restless Markov bandits. In Algorithmic Learning Theory, pages 214–228. Springer, 2012.
  • [18] B. Rajpathak, H. Pillai, and S. Bandyopadhyay. Analysis of stable periodic orbits in the one dimensional linear piecewise-smooth discontinuous map. Chaos, 22(3):033126, 2012.
  • [19] T. Thiele. Sur la compensation de quelques erreurs quasi-systématiques par la méthode des moindres carrés. CA Reitzel, 1880.
  • [20] I. Verloop. Asymptotic optimal control of multi-clss restless bandits. CNRS Technical Report, hal-00743781, 2014.
  • [21] S. Villar. Restless bandit index policies for dynamic sensor scheduling optimization. PhD thesis, Statistics Department, Universidad Carlos III de Madrid, 2012.
  • [22] E. Vul, G. Alvarez, J. B. Tenenbaum, and M. J. Black. Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model. In Neural Information Processing Systems, pages 1955–1963, 2009.
  • [23] R. R. Weber and G. Weiss. On an index policy for restless bandits. Journal of Applied Probability, pages 637–648, 1990.
  • [24] P. Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, pages 287–298, 1988.

6 Supplementary Material: Introduction

The results used but not proved in the main paper are given here as:

  • Proposition 9 which was used to show that ϕw(0)x\phi_{w}(0)\leq x,

  • Proposition 16 for the range of xx giving a specific mechanical word,

  • Proposition 17 showing the index is continuous for x+,x\in\mathbb{R}_{+},

  • Proposition 19 showing the properties of M(p)M(p) when pp is a palindrome.

  • and Proposition 20 for weak supermajorisation with β1\beta\neq 1.

A clarification of the extreme cases of Theorem 1 of the main paper is presented in the final section.

7 From xx-Threshold Policies to Mechanical Words

Some concepts relating to mechanical words appeared as early as 1771 in Jean Bernoulli’s study of continued fractions (Berstel et al, 2008). The term “mechanical sequences” appears in the work of Morse and Hedlund (Am. J. Math., Vol 62, No. 1, 1940, p. 1-42) who had just introduced the term “symbolic dynamics”. Morse and Hedlund studied the concept from the perspective of sequences of the form c+kβ\lfloor c+k\beta\rfloor for c,βc,\beta\in\mathbb{R} and kk\in\mathbb{Z}. They also studied the concept from the perspective of differential equations, motivating the term “Sturmian sequences.” Since that time there has been tremendous progress in the study of such sequences from the perspective of Combinatorics on Words (Lothaire, 2001). However, the recent (and highly-approachable) paper of Rajpathak, Pillai and Bandyopadhyay (Chaos, Vol. 22, 2012) on the piecewise-linear map-with-a-gap discovers such sequences without recognising them as mechanical sequences. Proposition 16 of this section is a substantial generalisation of that result and we could not find this proposition explicitly stated in the literature. Our result is not surprising if one has the intuition that there is a topological conjugacy between the maps of this section and the piecewise linear map-with-a-gap. However, it might be difficult to explicitly identify the appropriate topological conjugacy and thereby prove our result for all cases considered here.

7.1 Definitions

Let π\pi denote a word consisting of a string of 0s and 1s in which the kthk^{th} letter is πk\pi_{k} and letters i,i+1,,ji,i+1,\dots,j are πi:j\pi_{i:j}. Let |π|{\left|\pi\right|} be the length of π\pi and |π|w{\left|\pi\right|}_{w} for a word ww be the number of times that word ww appears in π\pi. Let ϵ\epsilon denote the empty word and πω\pi^{\omega} denote the infinite word constructed by repeatedly concatenating π\pi.

Consider two functions ϕ0:\phi_{0}:\mathcal{I}\rightarrow\mathcal{I} and ϕ1:\phi_{1}:\mathcal{I}\rightarrow\mathcal{I} where \mathcal{I} is an interval of .\mathbb{R}. We define the transformation ϕπ:\phi_{\pi}:\mathcal{I}\rightarrow\mathcal{I} for any word π\pi by the composition

ϕπ(x):=ϕπ|π|ϕπ2ϕπ1(x).\displaystyle\phi_{\pi}(x):=\phi_{\pi_{{\left|\pi\right|}}}\circ\cdots\circ\phi_{\pi_{2}}\circ\phi_{\pi_{1}}(x).

Let yπy_{\pi}\in\mathcal{I} be the fixed point of ϕπ\phi_{\pi}, so ϕπ(yπ)=yπ\phi_{\pi}(y_{\pi})=y_{\pi}, assuming a unique fixed point on \mathcal{I} exists.

Given xx\in\mathcal{I}, we call the sequence (xk:k1)(x_{k}:k\geq 1) the xx-threshold orbit for ϕ0,ϕ1\phi_{0},\phi_{1} if

x1\displaystyle x_{1} =ϕ1(x),\displaystyle=\phi_{1}(x), xk+1\displaystyle x_{k+1} ={ϕ1(xk)if xkxϕ0(xk)if xk<x\displaystyle=\begin{cases}\phi_{1}(x_{k})&\text{if $x_{k}\geq x$}\\ \phi_{0}(x_{k})&\text{if $x_{k}<x$}\end{cases} for k1.\displaystyle\text{for $k\geq 1$}.

We call π\pi the xx-threshold word for ϕ0,ϕ1\phi_{0},\phi_{1} if it is the shortest word such that xk+1=ϕ(πω)k(xk)x_{k+1}=\phi_{(\pi^{\omega})_{k}}(x_{k}) for all k1k\geq 1. We shall just write xx-threshold orbit and xx-threshold word where ϕ0,ϕ1\phi_{0},\phi_{1} are obvious from the context.

For p1p\geq 1, let Lp,RpL_{p},R_{p} be the morphisms (substitutions)

Lp:{00p+1110p1\displaystyle L_{p}:\begin{cases}0\rightarrow 0^{p+1}1\\ 1\rightarrow 0^{p}1\end{cases} Rp:{001p101p+1.\displaystyle R_{p}:\begin{cases}0\rightarrow 01^{p}\\ 1\rightarrow 01^{p+1}\end{cases}.

We say π\pi is a valid word if π{0,1}\pi\in\{0,1\} or π{Lp(w),Rp(w):p1}\pi\in\{L_{p}(w),R_{p}(w):p\geq 1\} for some valid word ww.

Remark. The morphisms Lp,RpL_{p},R_{p} generate the Christoffel tree so valid words are mechanical words. To see this, note that the Christoffel tree is generated by the following morphisms (Berstel et al, 2008, p. 37)

G\displaystyle G :{00101\displaystyle:\begin{cases}0\rightarrow 0\\ 1\rightarrow 01\end{cases} D~\displaystyle{\tilde{D}} :{00111.\displaystyle:\begin{cases}0\rightarrow 01\\ 1\rightarrow 1\end{cases}.

We may translate (from English to French) as Lp=GpD~L_{p}=G^{p}\circ\tilde{D} and Rp=D~pGR_{p}=\tilde{D}^{p}\circ G so any composition of LpL_{p} and RpR_{p} can be written as a composition of GG and D~\tilde{D}. Likewise, any composition of GG and D~\tilde{D} can be written as a composition of LpL_{p} and RpR_{p}. Specifically if pk,qk,pk+12p_{k},q_{k},p_{k+1}\geq 2 then

Gpk1D~qkGpk+1D~\displaystyle\cdots\circ G^{p_{k}-1}\circ{\tilde{D}}^{q_{k}}\circ G^{p_{k+1}}\circ{\tilde{D}}\circ\cdots
=(Gpk1D~)(D~qk1G)(Gpk+11D~)\displaystyle\quad=\cdots\circ(G^{p_{k}-1}\circ{\tilde{D}})\circ({\tilde{D}}^{q_{k}-1}\circ G)\circ(G^{p_{k+1}-1}\circ{\tilde{D}})\circ\cdots
=Lpk1Rqk1Lpk+11\displaystyle\quad=\cdots\circ L_{p_{k}-1}\circ R_{q_{k}-1}\circ L_{p_{k+1}-1}\circ\cdots

whereas if qk=1q_{k}=1 we have

Gpk1D~Gpk+1D~\displaystyle\cdots\circ G^{p_{k}-1}\circ{\tilde{D}}\circ G^{p_{k+1}}\circ{\tilde{D}}\circ\cdots
=(Gpk1D~)(Gpk+1D~)\displaystyle\quad=\cdots\circ(G^{p_{k}-1}\circ{\tilde{D}})\circ(G^{p_{k+1}}\circ{\tilde{D}})\circ\cdots
=Lpk1Lpk+1.\displaystyle\quad=\cdots\circ L_{p_{k}-1}\circ L_{p_{k+1}}\circ\cdots.

A symmetric argument holds if pk=1p_{k}=1 or pk+1=1p_{k+1}=1.

7.2 Fixed Points

Throughout, we make the following assumption about ϕ0,ϕ1\phi_{0},\phi_{1}. The existence of fixed points y0,y1y_{0},y_{1} is addressed immediately thereafter.

Assumption A2. Functions ϕ0:,ϕ1:\phi_{0}:\mathcal{I}\rightarrow\mathcal{I},\phi_{1}:\mathcal{I}\rightarrow\mathcal{I}, where \mathcal{I} is an interval of \mathbb{R}, are increasing and non-expansive. Equivalently, for all x,y:x<yx,y\in\mathcal{I}:x<y and for k{0,1}k\in\{0,1\} we have

ϕk(x)<ϕk(y)increasingandϕk(y)ϕk(x)<yxnon-expansive.\displaystyle\underbrace{\phi_{k}(x)<\phi_{k}(y)}_{\text{increasing}}\qquad\qquad\text{and}\qquad\qquad\underbrace{\phi_{k}(y)-\phi_{k}(x)<y-x}_{\text{non-expansive}}.

Furthermore, the fixed points y0,y1y_{0},y_{1} of ϕ0,ϕ1\phi_{0},\phi_{1} satisfy y1<y0y_{1}<y_{0}.

Proposition 8.

Suppose A2 holds, that xx\in\mathcal{I} and that ww is any non-empty word. Then ϕw(x)\phi_{w}(x) is increasing and non-expansive. Further, the fixed point ywy_{w} exists and is unique.

Proof.

First we show that ϕw(x)\phi_{w}(x) is increasing, by induction. In the base case, |w|=1{\left|w\right|}=1 and the claim follows from A2. For the inductive step assume ϕu(x)\phi_{u}(x) is increasing, where w=auw=au for some a{0,1}a\in\{0,1\} and word uu. Then for any x,y:x<yx,y\in\mathcal{I}:x<y,

ϕw(y)\displaystyle\phi_{w}(y) =ϕu(ϕa(y))\displaystyle=\phi_{u}(\phi_{a}(y))
>ϕu(ϕa(x))\displaystyle>\phi_{u}(\phi_{a}(x)) as ϕa(y)>ϕa(x)\phi_{a}(y)>\phi_{a}(x) and ϕu\phi_{u} is increasing
=ϕw(x).\displaystyle=\phi_{w}(x).

Therefore ϕw\phi_{w} is increasing.

Now we show that ϕw(x)\phi_{w}(x) is non-expansive, by induction. If |w|=1{\left|w\right|}=1 then this follows from A2. Else, say ϕu(x)\phi_{u}(x) is non-expansive where w=uaw=ua and a{0,1}a\in\{0,1\}. Then for any x,y:x<yx,y\in\mathcal{I}:x<y,

ϕw(y)ϕw(x)\displaystyle\phi_{w}(y)-\phi_{w}(x) =ϕa(ϕu(y))ϕa(ϕu(x))\displaystyle=\phi_{a}(\phi_{u}(y))-\phi_{a}(\phi_{u}(x))
<ϕu(y)ϕu(x)\displaystyle<\phi_{u}(y)-\phi_{u}(x) as ϕu(y)>ϕu(x)\phi_{u}(y)>\phi_{u}(x) and ϕa\phi_{a} is non-expansive
<yx\displaystyle<y-x as ϕu\phi_{u} is non-expansive.

Therefore ϕw\phi_{w} is non-expansive.

Let ψ(x):=max{ϕ0(x),ϕ1(x)}\psi(x):=\max\{\phi_{0}(x),\phi_{1}(x)\}. As ϕ1\phi_{1} is non-expansive we have

y1=ϕ1(y1)>ϕ1(y0)+y1y0\displaystyle y_{1}=\phi_{1}(y_{1})>\phi_{1}(y_{0})+y_{1}-y_{0}

which rearranges to give ϕ1(y0)<y0\phi_{1}(y_{0})<y_{0}, so that ψ(y0)=y0\psi(y_{0})=y_{0}. Also ψ\psi is increasing as ϕ0,ϕ1\phi_{0},\phi_{1} are increasing, so ϕw(y0)ψ(|w|)(y0)=y0\phi_{w}(y_{0})\leq\psi^{({\left|w\right|})}(y_{0})=y_{0}.

We now prove that ywy_{w} exists. The argument of the previous paragraph shows that g(x):=xϕw(x)g(x):=x-\phi_{w}(x) satisfies g(y0)0g(y_{0})\geq 0. A symmetric argument leads to the conclusion that g(y1)0g(y_{1})\leq 0. Clearly g(x)g(x) is a continuous function, so by the intermediate value theorem, there is some y[y0,y1]y\in[y_{0},y_{1}] for which g(y)=0g(y)=0. Equivalently y=ϕw(y)y=\phi_{w}(y). Therefore a fixed point ywy_{w} exists.

To show that the fixed point is unique, suppose both yy and zz are fixed points with y>zy>z. As ϕw\phi_{w} is non-expansive we have ϕw(y)ϕw(z)yz<1\frac{\phi_{w}(y)-\phi_{w}(z)}{y-z}<1. Yet, as ϕw(y)=y,ϕw(z)=z\phi_{w}(y)=y,\phi_{w}(z)=z we have

ϕw(y)ϕw(z)yz=1.\displaystyle\frac{\phi_{w}(y)-\phi_{w}(z)}{y-z}=1.

This is a contradiction. Therefore the fixed point is unique. ∎

Given a word ww, the next proposition shows when the transformation ϕw\phi_{w} increases or decreases its argument and what might be deduced from such an increase or decrease.

Proposition 9.

Suppose A2 holds, xx\in\mathcal{I} and ww is any non-empty word. Then

x<ϕw(x)ϕw(x)<ywx<yw\displaystyle x<\phi_{w}(x)\ \Leftrightarrow\ \phi_{w}(x)<y_{w}\ \Leftrightarrow\ x<y_{w} and x>ϕw(x)ϕw(x)>ywx>yw.\displaystyle x>\phi_{w}(x)\ \Leftrightarrow\ \phi_{w}(x)>y_{w}\ \Leftrightarrow\ x>y_{w}.
Proof.

We use Proposition 8 throughout the argument without further mention.

Say x<ywx<y_{w}. As ϕw\phi_{w} is increasing,

ϕw(x)\displaystyle\phi_{w}(x) <ϕw(yw)=yw\displaystyle<\phi_{w}(y_{w})=y_{w}

where the equality is the definition of ywy_{w}. Also, as ϕw\phi_{w} is non-expansive,

yw\displaystyle y_{w} =ϕw(yw)<ϕw(x)+ywx\displaystyle=\phi_{w}(y_{w})<\phi_{w}(x)+y_{w}-x

which rearranges to give x<ϕw(x)x<\phi_{w}(x).

Now say x>ywx>y_{w}. As above, we then have ϕw(x)>ϕw(yw)=yw\phi_{w}(x)>\phi_{w}(y_{w})=y_{w} and

yw=ϕw(yw)>ϕw(x)+ywx\displaystyle y_{w}=\phi_{w}(y_{w})>\phi_{w}(x)+y_{w}-x

so that x>ϕw(x)x>\phi_{w}(x).

The contrapositive of x>ywϕw(x)>ywx>y_{w}\Rightarrow\phi_{w}(x)>y_{w} is ϕw(x)ywxyw\phi_{w}(x)\leq y_{w}\Rightarrow x\leq y_{w}. But if ϕw(x)yw\phi_{w}(x)\neq y_{w} then xywx\neq y_{w} as ϕw\phi_{w} is increasing and therefore injective. Thus ϕw(x)<ywx<yw\phi_{w}(x)<y_{w}\Rightarrow x<y_{w}.

The contrapositive of x>ywx>ϕw(x)x>y_{w}\Rightarrow x>\phi_{w}(x) is xϕw(x)xywx\leq\phi_{w}(x)\Rightarrow x\leq y_{w}. But if xϕw(x)x\neq\phi_{w}(x) then xywx\neq y_{w} as ywy_{w} is a fixed point. So we can conclude that x<ϕw(x)x<ywx<\phi_{w}(x)\Rightarrow x<y_{w}.

By symmetry, ϕw(x)>ywx>yw\phi_{w}(x)>y_{w}\Rightarrow x>y_{w} and x>ϕw(x)x>ywx>\phi_{w}(x)\Rightarrow x>y_{w}. This completes the proof. ∎

Proposition 10.

Suppose A2 holds and π\pi is any word satisfying |π|0|π|1>0{\left|\pi\right|}_{0}{\left|\pi\right|}_{1}>0. Then y1<yπ<y0y_{1}<y_{\pi}<y_{0}.

Proof.

Say yπy1y_{\pi}\leq y_{1}. As |π|0>0{\left|\pi\right|}_{0}>0 we can write π=:s01q\pi=:s01^{q} for some q0q\geq 0. Thus

yπ=ϕπ(yπ)\displaystyle y_{\pi}=\phi_{\pi}(y_{\pi}) ϕs01q(y1)\displaystyle\leq\phi_{s01^{q}}(y_{1}) as ϕπ\phi_{\pi} is increasing
=ϕs0(y1)\displaystyle=\phi_{s0}(y_{1}) as ϕϵ(y1)=ϕ1(y1)=y1\phi_{\epsilon}(y_{1})=\phi_{1}(y_{1})=y_{1}
>ϕs(y1)\displaystyle>\phi_{s}(y_{1}) by Proposition 9
y1\displaystyle\geq y_{1} by repeating the same argument if |s|0>0{\left|s\right|}_{0}>0.

But this contradicts yπy1y_{\pi}\leq y_{1}. Therefore yπ>y1y_{\pi}>y_{1}.

A symmetrical argument leads to the conclusion that yπ<y0y_{\pi}<y_{0}. ∎

Proposition 11.

If A2 holds and n1n\geq 1 then y10n1<y010n1<y10ny_{10^{n-1}}<y_{010^{n-1}}<y_{10^{n}} and y01n<y101n1<y01n1.y_{01^{n}}<y_{101^{n-1}}<y_{01^{n-1}}.

Proof.

As y10n1<y0y_{10^{n-1}}<y_{0} by Proposition 10 we have ϕ0(y10n1)>y10n1\phi_{0}(y_{10^{n-1}})>y_{10^{n-1}} by Proposition 9 so that

ϕ010n1(y10n1)=ϕ10n1(ϕ0(y10n1))>ϕ10n1(y10n1)=y10n1\displaystyle\phi_{010^{n-1}}(y_{10^{n-1}})=\phi_{10^{n-1}}(\phi_{0}(y_{10^{n-1}}))>\phi_{10^{n-1}}(y_{10^{n-1}})=y_{10^{n-1}}

so Proposition 9 gives y010n1>y10n1.y_{010^{n-1}}>y_{10^{n-1}}.

Furthermore y10n=ϕ0(y010n1)y_{10^{n}}=\phi_{0}(y_{010^{n-1}}) by definition of yπy_{\pi} and y010n1<y0y_{010^{n-1}}<y_{0} by Proposition 10 so that ϕ0(y010n1)>y010n1\phi_{0}(y_{010^{n-1}})>y_{010^{n-1}} by Proposition 9. Thus y10n>y010n1y_{10^{n}}>y_{010^{n-1}}.

The proof that y01n<y101n1<y01n1y_{01^{n}}<y_{101^{n-1}}<y_{01^{n-1}} is symmetrical. ∎

Proposition 12.

Suppose A2 holds, M{Lq,Rq:q1}M\in\{L_{q},R_{q}:q\geq 1\} and w~\tilde{w} is any word. Let y~v\tilde{y}_{v} be the fixed point of ϕ~v:=ϕM(v)\tilde{\phi}_{v}:=\phi_{M(v)} for any word vv and let 0w1:=M(0w~1)0w1:=M(0\tilde{w}1). Then

x~[y~01w~,y~10w~]x:=ϕ0q(x~)[y01w,y10w].\displaystyle\tilde{x}\in[\tilde{y}_{01\tilde{w}},\tilde{y}_{10\tilde{w}}]\ \Leftrightarrow\ x:=\phi_{0^{q}}(\tilde{x})\in[y_{01w},y_{10w}].
Proof.

Say M=LqM=L_{q}. Note that

ϕ0q(y~01w~)\displaystyle\phi_{0^{q}}(\tilde{y}_{01\tilde{w}}) =ϕ0q(yLq(01w~))\displaystyle=\phi_{0^{q}}(y_{L_{q}(01\tilde{w})}) as y~v\tilde{y}_{v} is the fixed point of ϕ~v=ϕLq(v)\tilde{\phi}_{v}=\phi_{L_{q}(v)}
=ϕ0q(y0q01Lq(1w~))\displaystyle=\phi_{0^{q}}(y_{0^{q}01L_{q}(1\tilde{w})}) as Lq(0)=0q01L_{q}(0)=0^{q}01
=y01Lq(1w~)0q\displaystyle=y_{01L_{q}(1\tilde{w})0^{q}} as ϕa(yab)=yba\phi_{a}(y_{ab})=y_{ba} for any words a,ba,b
=y01w\displaystyle=y_{01w} as 0w1=Lq(0w~1)=0Lq(1w~)0q10w1=L_{q}(0\tilde{w}1)=0L_{q}(1\tilde{w})0^{q}1
and
ϕ0q(y~10w~)\displaystyle\phi_{0^{q}}(\tilde{y}_{10\tilde{w}}) =ϕ0q(yLq(10w~))\displaystyle=\phi_{0^{q}}(y_{L_{q}(10\tilde{w})})
=ϕ0q(y0q1Lq(0w~))\displaystyle=\phi_{0^{q}}(y_{0^{q}1L_{q}(0\tilde{w})})
=y1Lq(0w~)0q\displaystyle=y_{1L_{q}(0\tilde{w})0^{q}}
=y10w\displaystyle=y_{10w} as 0w1=Lq(0w~)0q1.\displaystyle\text{as $0w1=L_{q}(0\tilde{w})0^{q}1$}.

Proposition 8 shows that y~01w~,y~10w~\tilde{y}_{01\tilde{w}},\tilde{y}_{10\tilde{w}} exist. So the above equalities show that an inverse ϕ0q(1)(x)\phi^{(-1)}_{0^{q}}(x) exists for x{y01w,y10w}x\in\{y_{01w},y_{10w}\}. As ϕ0q\phi_{0^{q}} is increasing and continuous, we have

x[y01w,y10w]x~[ϕ0q(1)(y01w),ϕ0q(1)((y10w)]=[y~01w~,y~10w~].\displaystyle x\in[y_{01w},y_{10w}]\ \Leftrightarrow\ \tilde{x}\in[\phi^{(-1)}_{0^{q}}(y_{01w}),\phi^{(-1)}_{0^{q}}((y_{10w})]=[\tilde{y}_{01\tilde{w}},\tilde{y}_{10\tilde{w}}].

The proof for M=RqM=R_{q} is symmetric. ∎

7.3 xx-Threshold Words

Proposition 13.

Suppose A2 holds, π\pi is the xx-threshold word and n1n\geq 1. Then

  1. 1.

    xy10n1|πω|0n=0x\leq y_{10^{n-1}}\Rightarrow{\left|\pi^{\omega}\right|}_{0^{n}}=0

  2. 2.

    xy010n1|πω|10n11=0x\geq y_{010^{n-1}}\Rightarrow{\left|\pi^{\omega}\right|}_{10^{n-1}1}=0

  3. 3.

    xy01n1|πω|1n=0x\geq y_{01^{n-1}}\Rightarrow{\left|\pi^{\omega}\right|}_{1^{n}}=0

  4. 4.

    xy101n1|πω|01n10=0x\leq y_{101^{n-1}}\Rightarrow{\left|\pi^{\omega}\right|}_{01^{n-1}0}=0

Proof.

If xy1x\leq y_{1} then it follows from Proposition 9 that the xx-threshold word is π=1\pi=1. Likewise if x>y0x>y_{0} then the xx-threshold word is π=0\pi=0. In these cases Claims 1 and 2 hold, so in the following we assume that y1<xy0y_{1}<x\leq y_{0}.

Claim 1: Let (xk)(x_{k}) the xx-threshold orbit. If (πω)k:k+n2=0n1(\pi^{\omega})_{k:k+n-2}=0^{n-1} for some kk, then

xk+n1\displaystyle x_{k+n-1} =ϕ0n1(xk)\displaystyle=\phi_{0^{n-1}}(x_{k}) by definition of (xk)(x_{k})
ϕ0n1(ϕ1(x))\displaystyle\geq\phi_{0^{n-1}}(\phi_{1}(x)) as xkϕ1(x)x_{k}\geq\phi_{1}(x) for all k0k\geq 0 and ϕ0n1\phi_{0^{n-1}} is increasing
=ϕ10n1(x)\displaystyle=\phi_{10^{n-1}}(x)
x\displaystyle\geq x if xy10n1x\leq y_{10^{n-1}} by Proposition 9.

But if xk+n1xx_{k+n-1}\geq x then πk+n1=1\pi_{k+n-1}=1 by definition π\pi. Therefore |π|0n=0.{\left|\pi\right|}_{0^{n}}=0.

Claim 2: Let (xk)(x_{k}) be the xx-threshold orbit. If (πω)k:k+n1=10n1(\pi^{\omega})_{k:k+n-1}=10^{n-1} for some kk, then

xk+n\displaystyle x_{k+n} =ϕ10n1(xk)\displaystyle=\phi_{10^{n-1}}(x_{k})
<ϕ10n1(ϕ0(x))\displaystyle<\phi_{10^{n-1}}(\phi_{0}(x)) as xk<ϕ0(x)x_{k}<\phi_{0}(x) for all k0k\geq 0 and ϕ10n1\phi_{10^{n-1}} is increasing
=ϕ010n1(x)\displaystyle=\phi_{010^{n-1}}(x)
x\displaystyle\leq x if xy010n1x\geq y_{010^{n-1}} by Proposition 9.

But if xk+n<xx_{k+n}<x then (πω)k+n=0(\pi^{\omega})_{k+n}=0. Therefore |π|10n11=0.{\left|\pi\right|}_{10^{n-1}1}=0.

The proof of Claims 3 and 4 is symmetrical. ∎

Proposition 14.

Suppose A2 holds and π\pi is a xx-threshold word. Then

  1. 1.

    |π|00>0π=Ln(w){\left|\pi\right|}_{00}>0\Rightarrow\pi=L_{n}(w) for some word ww and some n1n\geq 1

  2. 2.

    |π|11>0π=Rn(w){\left|\pi\right|}_{11}>0\Rightarrow\pi=R_{n}(w) for some word ww and some n1n\geq 1

Proof.

First, applying Claims 1 and 3 of Proposition 13 with n=2n=2 we have |π|00=0{\left|\pi\right|}_{00}=0 for xy10x\leq y_{10} and |π|11=0{\left|\pi\right|}_{11}=0 for xy01x\geq y_{01}. Furthermore y10=ϕ0(y01)>y01y_{10}=\phi_{0}(y_{01})>y_{01} by Proposition 9. Thus π\pi cannot contain both 00 and 11.

So, if |π|00>0{\left|\pi\right|}_{00}>0 then π\pi is of the form 0q110q210^{q_{1}}10^{q_{2}}1\dots with strings of 0s separated by individual 1s. Let q:=minkqkq:=\min_{k}q_{k}. By Propositions 11 and 13, Iq:=(y10q1,y010q)I_{q}:=(y_{10^{q-1}},y_{010^{q}}) is the only set of xx values for which πω\pi^{\omega} can contain 10q110^{q}1. Thus πω\pi^{\omega} can only contain both 10q110^{q}1 and 10q+1110^{q+1}1 in the interval

Fq:=IqIq+1=(y10q1,y010q)(y10q,y010q+1)=(y10q,y010q)\displaystyle F_{q}:=I_{q}\cap I_{q+1}=(y_{10^{q-1}},y_{010^{q}})\cap(y_{10^{q}},y_{010^{q+1}})=(y_{10^{q}},y_{010^{q}})

noting Proposition 11 gives y10q1<y010q1<y10q<y010q.y_{10^{q-1}}<y_{010^{q-1}}<y_{10^{q}}<y_{010^{q}}.

Finally, we have FqFq=F_{q}\cap F_{q^{\prime}}=\emptyset for qqq\neq q^{\prime}, which also follows from Proposition 11. Thus if |π|00>0{\left|\pi\right|}_{00}>0 then π\pi is a concatenation of Lq(0)L_{q}(0) and Lq(1)L_{q}(1). Equivalently π=Lq(w)\pi=L_{q}(w) for some word ww and some q1q\geq 1 as in Claim 1.

The proof of Claim 2 is symmetric. ∎

Proposition 15.

Suppose A2 holds and π\pi is a xx-threshold word. Then π\pi is a valid word.

Proof.

There are three cases to consider: either |π|00=|π|11=0{\left|\pi\right|}_{00}={\left|\pi\right|}_{11}=0 or |π|00>0{\left|\pi\right|}_{00}>0 or |π|11>0{\left|\pi\right|}_{11}>0.

First case: The only non-empty words not containing 0000 or 1111 are 0,1,(01)n,(10)n0,1,(01)^{n},(10)^{n} for some n1n\geq 1. Now xx-threshold words start with 0 unless xy1x\leq y_{1} (in which case π=1\pi=1) so π(10)n\pi\neq(10)^{n}. Further, the xx-threshold word was defined to be the shortest word such that such that xk+1=A(πω)kxkx_{k+1}=A_{(\pi^{\omega})_{k}}x_{k} so this leaves us with the options 0,1,010,1,01. These are all valid words.

Second case: If π\pi contains 00, we may write π=Lq(w)\pi=L_{q}(w) for some word ww, by Proposition 14. Now from point xkx_{k} on the xx-threshold orbit we have πk:k+q=0q+1\pi_{k:k+q}=0^{q+1} if and only if ϕ0q(xk)<x\phi_{0^{q}}(x_{k})<x which corresponds to xk<ϕ0(q)(x)=:x~x_{k}<\phi_{0}^{(-q)}(x)=:\tilde{x}. So the word ww corresponds to a x~\tilde{x}-threshold orbit (x~k:k1)(\tilde{x}_{k}:k\geq 1) for ψ0(x):=ϕ0q+11(x),ψ1(x):=ϕ0q1(x)\psi_{0}(x):=\phi_{0^{q+1}1}(x),\psi_{1}(x):=\phi_{0^{q}1}(x). To spell it out, we have

x~1\displaystyle\tilde{x}_{1} =ψ1(x~),\displaystyle=\psi_{1}(\tilde{x}), x~k+1\displaystyle\tilde{x}_{k+1} =ψwk(x~k),\displaystyle=\psi_{w_{k}}(\tilde{x}_{k}), wk\displaystyle w_{k} ={1if x~kx~0if x~k<x~\displaystyle=\begin{cases}1&\text{if $\tilde{x}_{k}\geq\tilde{x}$}\\ 0&\text{if $\tilde{x}_{k}<\tilde{x}$}\end{cases} for k1k\geq 1

and as for the original system, we define y~π\tilde{y}_{\pi} as the fixed point y~π=ψπ(y~π)\tilde{y}_{\pi}=\psi_{\pi}(\tilde{y}_{\pi}).

Now ψ0,ψ1\psi_{0},\psi_{1} are non-negative, as ϕ0,ϕ1\phi_{0},\phi_{1} are non-negative. Also ψ0,ψ1\psi_{0},\psi_{1} are monotonically increasing and non-expansive by Proposition 8. Further,

ϕ0q+11(y0q1)=ϕ0q1(ϕ0(y0q1))>ϕ0q1(y0q1)=y0q1\displaystyle\phi_{0^{q+1}1}(y_{0^{q}1})=\phi_{0^{q}1}(\phi_{0}(y_{0^{q}1}))>\phi_{0^{q}1}(y_{0^{q}1})=y_{0^{q}1}

so that y0q+11>y0q1y_{0^{q+1}1}>y_{0^{q}1} by Proposition 9. But by definition y~0=y0q+11\tilde{y}_{0}=y_{0^{q+1}1} and y~0=y0q1\tilde{y}_{0}=y_{0^{q}1}, so that y~1<y~0\tilde{y}_{1}<\tilde{y}_{0}. Therefore ψ0,ψ1\psi_{0},\psi_{1} satisfy A2.

Third case: We prove that π=Rq(w)\pi=R_{q}(w) for some positive integer qq and word ww. We also show that word ww is a x^\hat{x}-threshold word for a pair of functions (say) χ0,χ1\chi_{0},\chi_{1} which satisfy A2. The argument is symmetric to the second case, so it is omitted.

In conclusion, either

  1. 1.

    π{0,1,L1(1)}\pi\in\{0,1,L_{1}(1)\} which are valid words

  2. 2.

    π=Lq(w)\pi=L_{q}(w) where ww is a x~\tilde{x}-threshold word for ψ0,ψ1\psi_{0},\psi_{1} which satisfy Propositions 8-14 and therefore ww satisfies this conclusion

  3. 3.

    or π=Rq(w)\pi=R_{q}(w) where ww is a x^\hat{x}-threshold word for χ0,χ1\chi_{0},\chi_{1} which satisfy Propositions 8-14 and therefore ww satisfies this conclusion.

Thus π\pi is a valid word. This completes the proof. ∎

The following proposition shows that all valid words are xx-threshold words and tells us explicitly which values of xx produce a given valid word. It is one of the key results of the main paper.

Proposition 16.

Suppose A2 is satisfied and 0w10w1 is any valid word. Then

0w1 is the x-threshold wordx[y01w,y10w].\displaystyle\text{$0w1$ is the $x$-threshold word}\ \Leftrightarrow\ x\in[y_{01w},y_{10w}].
Proof.

Let V1:={Lq(1),Rq(1):q1},Vn+1:={Lq(v),Rq(v):vVn,q1}V_{1}:=\{L_{q}(1),R_{q}(1):q\geq 1\},V_{n+1}:=\{L_{q}(v),R_{q}(v):v\in V_{n},q\geq 1\}. Note that V1V_{1} contains Lq(0)=0q+11=Lq+1(1)L_{q}(0)=0^{q+1}1=L_{q+1}(1) and Rq(0)=01qR_{q}(0)=01^{q} which for q2q\geq 2 equals Rq1(1)R_{q-1}(1) and for q=1q=1 equals 01=L1(1)01=L_{1}(1). Thus n=1Vn\cup_{n=1}^{\infty}V_{n} is the set of all valid words of form 0w10w1.

We use induction with hypothesis

Hn:0w1Vnis the x-threshold wordx[y01w,y10w]\displaystyle H_{n}:\quad 0w1\in V_{n}\ \text{is the $x$-threshold word}\ \Leftrightarrow\ x\in[y_{01w},y_{10w}]

Base case (H1H_{1}). Say 0w1=0q10w1=0^{q}1 is the xx-threshold word. Then

x\displaystyle x >ϕ(10q)n10q1(x)\displaystyle>\phi_{(10^{q})^{n}10^{q-1}}(x) for all n0n\geq 0
=ϕ(010q1)n(ϕ10q1(x))\displaystyle=\phi_{(010^{q-1})^{n}}(\phi_{10^{q-1}}(x))
x\displaystyle\Rightarrow\ x limnϕ(010q1)n(ϕ10q1(x))=y010q1.\displaystyle\geq\lim_{n\rightarrow\infty}\phi_{(010^{q-1})^{n}}(\phi_{10^{q-1}}(x))=y_{010^{q-1}}.

The definition of the xx-threshold word also gives xϕ10q(x)x\leq\phi_{10^{q}}(x). Therefore xy10qx\geq y_{10^{q}} by Proposition 9. Thus if 0q10^{q}1 is the xx-threshold word then x[y01w,y10w]x\in[y_{01w},y_{10w}].

Now say x[y010q1,y10q]x\in[y_{010^{q-1}},y_{10^{q}}]. Proposition 10 gives y0<x<y1y_{0}<x<y_{1} so that the xx-threshold orbit (xk)(x_{k}) is contained in (y0,y1)(y_{0},y_{1}). So Proposition 9 shows that ϕ0(xk)>xk\phi_{0}(x_{k})>x_{k} and ϕ1(xk)<xk\phi_{1}(x_{k})<x_{k} for all k0k\geq 0. So to prove that the xx-threshold word is 0q10^{q}1 we need only show that ϕ(10q)n10q1(x)<x\phi_{(10^{q})^{n}10^{q-1}}(x)<x and ϕ(10q)n(x)x\phi_{(10^{q})^{n}}(x)\geq x for all n0n\geq 0. But if xy010q1x\geq y_{010^{q-1}} then for all n0n\geq 0

x\displaystyle x ϕ(010q1)n(x)\displaystyle\geq\phi_{(010^{q-1})^{n}}(x) by Proposition 9
>ϕ(010q1)n(ϕ10q1(x))\displaystyle>\phi_{(010^{q-1})^{n}}(\phi_{10^{q-1}}(x)) as y10q1<y010q1xy_{10^{q-1}}<y_{010^{q-1}}\leq x by Claim 3 of Proposition 11
=ϕ(10q)n10q1(x).\displaystyle=\phi_{(10^{q})^{n}10^{q-1}}(x).

Also if xy10qx\leq y_{10^{q}} then ϕ(10q)n(x)x\phi_{(10^{q})^{n}}(x)\geq x for all n0n\geq 0 by Proposition 9. Therefore for 0w1=0q10w1=0^{q}1, we have x[y01w,y10w]x\in[y_{01w},y_{10w}] implies that 0w10w1 is the xx-threshold word.

For 0w1=01q0w1=01^{q}, the proof that π=01qx[y01w,y10w]\pi=01^{q}\Leftrightarrow x\in[y_{01w},y_{10w}] is symmetric, so it is omitted.

Inductive Step. Assume 0w~10\tilde{w}1 satisfies HnH_{n}.

Say 0w1=Lq(0w~1)0w1=L_{q}(0\tilde{w}1). Let ki:=|Lq(((0w~1)ω)1:i1)|+1k_{i}:={\left|L_{q}(((0\tilde{w}1)^{\omega})_{1:i-1})\right|}+1 so (πω)ki(\pi^{\omega})_{k_{i}} is aligned with the start of the ithi^{th} letter of (0w~1)ω(0\tilde{w}1)^{\omega}. Let xk:=ϕ((10w)ω)1:k(x),x~i:=xki,x=ϕ0q(x~)x_{k}:=\phi_{((10w)^{\omega})_{1:k}}(x),\tilde{x}_{i}:=x_{k_{i}},x=\phi_{0^{q}}(\tilde{x}) and let y~v\tilde{y}_{v} denote the fixed point of ϕ~v:=ϕLq(v)\tilde{\phi}_{v}:=\phi_{L_{q}(v)} for any word vv. Then we have

Lq(0w~1)L_{q}(0\tilde{w}1) is the xx-threshold word for ϕ0,ϕ1\phi_{0},\phi_{1}
\displaystyle\Leftrightarrow\qquad ((0w1)ω)ki:ki+q=0q+1((0w1)^{\omega})_{k_{i}:k_{i}+q}=0^{q+1} if and only if ϕ0q(xki)<x\phi_{0^{q}}(x_{k_{i}})<x
\displaystyle\Leftrightarrow\qquad ((0w~1)ω)i=0((0\tilde{w}1)^{\omega})_{i}=0 if and only if x~i<x~\tilde{x}_{i}<\tilde{x}
\displaystyle\Leftrightarrow\qquad 0w~10\tilde{w}1 is the x~\tilde{x}-threshold word for ϕ~0,ϕ~1\tilde{\phi}_{0},\tilde{\phi}_{1}
\displaystyle\Leftrightarrow\qquad x~[y~01w~,y~10w~]\tilde{x}\in[\tilde{y}_{01\tilde{w}},\tilde{y}_{10\tilde{w}}] as 0w~10\tilde{w}1 satisfies HnH_{n}
\displaystyle\Leftrightarrow\qquad x[y01w,y10w]x\in[y_{01w},y_{10w}] by Proposition 12

Symmetrically we may conclude that π=0w1=Rq(0w~1)x[y01w,y10w]\pi=0w1=R_{q}(0\tilde{w}1)\Leftrightarrow x\in[y_{01w},y_{10w}]. Therefore Hn+1H_{n+1} is true.

This completes the proof. ∎

8 Continuity of the Index

We showed that the Whittle index is increasing on the domain of each fixed Christoffel word. However, we also need to show that the index is continuous as we move between words. So here we prove the following proposition.

Proposition 17.

Suppose λ()\lambda(\cdot) is as in the main paper. Then λ(x)\lambda(x) is a continuous function of x+x\in\mathbb{R}_{+}.

We use the following definitions.

Definition. Let w~\tilde{w} be the reverse of word ww, wωw^{\omega} be the word constructed by concatenating ww infinitely many times, |w|{\left|w\right|} be the length of word ww and |w|u{\left|w\right|}_{u} be the number of times that word uu is a factor of ww.

Definition. For a possibly-infinite word ww and numbers x,β(0,1)x\in\mathbb{R},\beta\in(0,1) define

S(w,x)\displaystyle S(w,x) :=n=0|w|1βnϕw1:n(x)\displaystyle:=\sum_{n=0}^{{\left|w\right|}-1}\beta^{n}\phi_{w_{1:n}}(x)
λ(0w1,x)\displaystyle\lambda(0w1,x) :=1β|0w1|1β(S((01w)ω,x)S((10w)ω,x)).\displaystyle:=\frac{1-\beta^{{\left|0w1\right|}}}{1-\beta}\left(S((01w)^{\omega},x)-S((10w)^{\omega},x)\right).

Remark. If π\pi is the xx-threshold word then λ(x)=λ(π,x)\lambda(x)=\lambda(\pi,x) where λ(x)\lambda(x) is the Whittle index.

Remark. For a word abab, this definition gives

S(ab,x)\displaystyle S(ab,x) =S(a,x)+β|a|S(b,ϕa(x))\displaystyle=S(a,x)+\beta^{{\left|a\right|}}S(b,\phi_{a}(x)) (13)
so for |ϕaω(x)|<{\left|\phi_{a^{\omega}}(x)\right|}<\infty and β(0,1)\beta\in(0,1) we have
S(aωb,x)\displaystyle S(a^{\omega}b,x) =S(aω,x).\displaystyle=S(a^{\omega},x). (14)
Further, if xa=ϕa(xa)x_{a}=\phi_{a}(x_{a}) then the formula for the sum of a geometric progression gives
S(aω,xa)\displaystyle S(a^{\omega},x_{a}) =S(a,xa)1β|a|.\displaystyle=\frac{S(a,x_{a})}{1-\beta^{{\left|a\right|}}}. (15)

Definition. Let XπX_{\pi} be the range of xx for which the xx-threshold word is π\pi.

The following construction is closely related to the beautiful Christoffel tree (Berstel et al, 2008).

Definition. Consider the mapping CC which takes a sequence of words and returns a sequence containing the original words mingled with the concatenation of neighbouring words as follows:

C((a,b,c,d,,x,y,z)):=(a,ab,b,bc,c,cd,d,,x,xy,y,yz,z).\displaystyle C((a,b,c,d,\dots,x,y,z)):=(a,ab,b,bc,c,cd,d,\dots,x,xy,y,yz,z).

Now consider the sequences tk:=C(k)((0,1))t_{k}:=C^{(k)}((0,1)) for k0k\geq 0. The first few such sequences are

t0\displaystyle t_{0} =(0,\displaystyle=(0, 1)\displaystyle 1)
t1\displaystyle t_{1} =(0,\displaystyle=(0, 01,\displaystyle 01, 1)\displaystyle 1)
t2\displaystyle t_{2} =(0,\displaystyle=(0, 001,\displaystyle 001, 01,\displaystyle 01, 011,\displaystyle 011, 1)\displaystyle 1)
t3\displaystyle t_{3} =(0,\displaystyle=(0, 0001,\displaystyle 0001, 001,\displaystyle 001, 00101,\displaystyle 00101, 01,\displaystyle 01, 01011,\displaystyle 01011, 011,\displaystyle 011, 0111,\displaystyle 0111, 1)\displaystyle 1) .

Remark. If utku\in t_{k} then |u|1{\left|u\right|}\geq 1 for any k0k\geq 0. Now suppose u,vu,v are adjacent in tkt_{k} and we have |uv|k+2{\left|uv\right|}\geq k+2. Then tk+1t_{k+1} contains u,uv,vu,uv,v from which we can construct uuvuuv and uvvuvv. But |uuv|=|u|+|uv|1+k+2=k+3{\left|uuv\right|}={\left|u\right|}+{\left|uv\right|}\geq 1+k+2=k+3 and |uvv|=|uv|+|v|k+2+1=k+3{\left|uvv\right|}={\left|uv\right|}+{\left|v\right|}\geq k+2+1=k+3. Thus, by induction, we have shown that

|uv|\displaystyle{\left|uv\right|} k+2\displaystyle\geq k+2 for any adjacent pair u,vu,v in tkt_{k} and any k0k\geq 0. (16)

8.1 Long Common Prefixes

We gather the results needed to prove Proposition 17. Most of these results these relate to the notion that if |xy|{\left|x-y\right|} is small and a,ba,b are the xx- and yy-threshold words, then words a,ba,b usually have a long common prefix, although this is not always the case.

The following simple result is repeatedly used in the other Lemmas of this subsection.

Lemma 1.

Suppose (0a1,0b1)(0a1,0b1) is a standard pair. Then a10b=b01aa10b=b01a.

Proof.

As (0a1,0b1)(0a1,0b1) is a standard pair, 0a10b1=:0w10a10b1=:0w1 is a Christoffel word. As 0a1,0b1,0w10a1,0b1,0w1 are Christoffel words, a,b,wa,b,w are palindromes. Thus a10b=w=w~=b~01a~=b01a.a10b=w=\tilde{w}=\tilde{b}01\tilde{a}=b01a.

If (0a1,0b1)(0a1,0b1) is a standard pair, then the interval X0b1X_{0b1} is immediately to the left of X0a1(0b1)ωX_{0a1(0b1)^{\omega}}. Since the words 0b10b1 and 0a1(0b1)ω0a1(0b1)^{\omega} can differ within the first few letters, continuity of λ(x)\lambda(x) at x=supX0b1x=\sup X_{0b1} is not obvious. Similarly, X(0a1)ω0b1X_{(0a1)^{\omega}0b1} is immediately to the left of X0a1X_{0a1}. However, the factors 1β|(0a1)ω0b1|1-\beta^{{\left|(0a1)^{\omega}0b1\right|}} and 1β|0a1|1-\beta^{{\left|0a1\right|}} appearing in the definitions of the corresponding Whittle indices are different for |a|<{\left|a\right|}<\infty. Thus continuity of λ(x)\lambda(x) at x=supX0a1x=\sup X_{0a1} is not obvious. The next two Lemmas address these questions.

Lemma 2.

Suppose (0a1,0b1)(0a1,0b1) is a standard pair and let x=ϕ10b(x)x=\phi_{10b}(x). Then

λ(0b1,x)=λ(0a1(0b1)ω,x).\displaystyle\lambda({0b1},x)=\lambda({0a1(0b1)^{\omega}},x).
Proof.

The right-hand side λ(0a1(0b1)ω,x)\lambda({0a1(0b1)^{\omega}},x) involves the sum

S(10a1(0b1)ω,x)\displaystyle S(10a1(0b1)^{\omega},x) =S(10b01a(10b)ω,x)\displaystyle=S(10b01a(10b)^{\omega},x) by Lemma 1
=S(10b,x)+β|10b|S(01a(10b)ω,ϕ10b(x))\displaystyle=S(10b,x)+\beta^{{\left|10b\right|}}S(01a(10b)^{\omega},\phi_{10b}(x)) by 13
=S(10b,x)+β|10b|S(01a(10b)ω,x)\displaystyle=S(10b,x)+\beta^{{\left|10b\right|}}S(01a(10b)^{\omega},x) as x=ϕ10b(x)x=\phi_{10b}(x)
=(1β|10b|)S((10b)ω,x)+β|10b|S(01a(10b)ω,x)\displaystyle=(1-\beta^{{\left|10b\right|}})S((10b)^{\omega},x)+\beta^{{\left|10b\right|}}S(01a(10b)^{\omega},x) by 15.\displaystyle\text{by~\ref{eq:SaOmega}}. (17)

Now we note that repeated application of Lemma 1 gives

01a(10b)ω=01a10b(10b)ω=01b 01a(10b)ω=(01b)ω01a.\displaystyle 01a(10b)^{\omega}=01a10b(10b)^{\omega}=01b\,01a(10b)^{\omega}=(01b)^{\omega}01a. (18)

Thus

λ(0a1(0b1)ω,x)\displaystyle\lambda({0a1(0b1)^{\omega}},x) =1β|0a1(0b1)ω|1β(S((01a1(0b1)ω)ω,x)S((10a1(0b1)ω)ω,x))\displaystyle=\frac{1-\beta^{{\left|0a1(0b1)^{\omega}\right|}}}{1-\beta}\left(S((01a1(0b1)^{\omega})^{\omega},x)-S((10a1(0b1)^{\omega})^{\omega},x)\right)
=S(01a1(0b1)ω,x)S(10a1(0b1)ω,x)1β\displaystyle=\frac{S(01a1(0b1)^{\omega},x)-S(10a1(0b1)^{\omega},x)}{1-\beta} by 14
=1β|10b|1β(S(01a(10b)ω,x)S((10b)ω,x))\displaystyle=\frac{1-\beta^{{\left|10b\right|}}}{1-\beta}\left(S(01a(10b)^{\omega},x)-S((10b)^{\omega},x)\right) by 17
=1β|10b|1β(S((01b)ω,x)S((10b)ω,x))\displaystyle=\frac{1-\beta^{{\left|10b\right|}}}{1-\beta}\left(S((01b)^{\omega},x)-S((10b)^{\omega},x)\right) by 18
=λ(0b1,x).\displaystyle=\lambda({0b1},x).

This completes the proof. ∎

Lemma 3.

Suppose (0a1,0b1)(0a1,0b1) is a standard pair and let x=ϕ01a(x)x=\phi_{01a}(x). Then

λ((0a1)ω0b1,x)=λ(0a1,x).\displaystyle\lambda({(0a1)^{\omega}0b1},x)=\lambda({0a1},x).
Proof.

The left-hand side λ((0a1)ω0b1,x)\lambda({(0a1)^{\omega}0b1},x) involves the sum

S(01(a10)ω0b1,x)\displaystyle S(01(a10)^{\omega}0b1,x) =S(01(a10)ω,x)\displaystyle=S(01(a10)^{\omega},x) by 14
=S(01a,x)+β|01a|S((10a)ω,ϕ01a(x))\displaystyle=S(01a,x)+\beta^{{\left|01a\right|}}S((10a)^{\omega},\phi_{01a}(x)) by 13
=S(01a,x)+β|01a|S((10a)ω,x)\displaystyle=S(01a,x)+\beta^{{\left|01a\right|}}S((10a)^{\omega},x) as x=ϕ01a(x)x=\phi_{01a}(x)
=(1β|01a|)S((01a)ω,x)+β|01a|S((10a)ω,x)\displaystyle=(1-\beta^{{\left|01a\right|}})S((01a)^{\omega},x)+\beta^{{\left|01a\right|}}S((10a)^{\omega},x) by 15.\displaystyle\text{by~\ref{eq:SaOmega}}. (19)

Thus

λ((0a1)ω0b1,x)\displaystyle\lambda({(0a1)^{\omega}0b1},x) =1β|(0a1)ω0b1|1β(S((01(a10)ω0b1)ω,x)S((10(a10)ω0b1)ω,x))\displaystyle=\frac{1-\beta^{{\left|(0a1)^{\omega}0b1\right|}}}{1-\beta}\left(S((01(a10)^{\omega}0b1)^{\omega},x)-S((10(a10)^{\omega}0b1)^{\omega},x)\right)
=11β(S(01(a10)ω0b1,x)S((10a)ω,x))\displaystyle=\frac{1}{1-\beta}\left(S(01(a10)^{\omega}0b1,x)-S((10a)^{\omega},x)\right) by 14
=1β|01a|1β(S((01a)ω,x)S((10a)ω,x))\displaystyle=\frac{1-\beta^{{\left|01a\right|}}}{1-\beta}(S((01a)^{\omega},x)-S((10a)^{\omega},x)) by 19
=λ(0a1,x).\displaystyle=\lambda({0a1},x).

This completes the proof. ∎

To demonstrate continuity at other points, we will need to rely on the fact that nearby words often have a long common prefix as shown by the following two Lemmas.

Lemma 4.

Suppose (0a1,0b1)(0a1,0b1) is a subsequence of tkt_{k} for some k1k\geq 1. Then 0b01a0b01a is a prefix of both (0a1)ω(0a1)^{\omega} and 0b(01b)ω0b(01b)^{\omega}.

Proof.

Let a=ba=b\cdots indicate that bb is a prefix of word aa and consider the statements

A(a,b):(a10)ω=bandB(a,b):(b01)ω=a.\displaystyle A(a,b):(a10)^{\omega}=b\cdots\qquad\text{and}\qquad B(a,b):(b01)^{\omega}=a\cdots.

It suffices to show that A(a,b)A(a,b) and B(a,b)B(a,b) are true for any adjacent words 0a1,0b10a1,0b1 in tkt_{k} for k0k\geq 0. This is because

A(a,b)\displaystyle A(a,b) (0a1)ω=0a10(a10)ω=0a10b=0b01a\displaystyle\Rightarrow(0a1)^{\omega}=0a10(a10)^{\omega}=0a10b\cdots=0b01a\cdots
where the last equality follows from Lemma 1 and
B(a,b)\displaystyle B(a,b) 0b(01b)ω=0b01(b01)ω=0b01a\displaystyle\Rightarrow 0b(01b)^{\omega}=0b01(b01)^{\omega}=0b01a\cdots

which are the claims of the Lemma.

We shall use induction. Take t2=(0,001,01,011,01)t_{2}=(0,001,01,011,01) as the base case. We must show that A(0,ϵ),B(0,ϵ),A(ϵ,1),B(ϵ,1)A(0,\epsilon),B(0,\epsilon),A(\epsilon,1),B(\epsilon,1) are true. However these statements are respectively that (001)ω=ϵ,(01)ω=0,(10)ω=1,(101)ω=ϵ(001)^{\omega}=\epsilon\cdots,(01)^{\omega}=0\cdots,(10)^{\omega}=1\cdots,(101)^{\omega}=\epsilon\cdots and are all true.

Otherwise, say A(a,b),B(a,b)A(a,b),B(a,b) are true for any adjacency 0a1,0b10a1,0b1 in tkt_{k}. Let 0a10b1=0c10a10b1=0c1 so

c=a10b=b01a\displaystyle c=a10b=b01a

using Lemma 1 again. Then the statements A(a,c),B(a,c),A(c,b),B(c,b)A(a,c),B(a,c),A(c,b),B(c,b) are all true as

(a10)ω\displaystyle(a10)^{\omega} =a10(a10)ω=a10b=c\displaystyle=a10(a10)^{\omega}=a10b\cdots=c\cdots by A(a,b)A(a,b) and as c=a10bc=a10b
(c01)ω\displaystyle(c01)^{\omega} =c=a\displaystyle=c\cdots=a\cdots as c=a10bc=a10b
(c10)ω\displaystyle(c10)^{\omega} =c=b\displaystyle=c\cdots=b\cdots as c=b01ac=b01a
(b01)ω\displaystyle(b01)^{\omega} =b01(b01)ω=b01a=c\displaystyle=b01(b01)^{\omega}=b01a\cdots=c\cdots by B(a,b)B(a,b) and as c=b01ac=b01a.

Thus A(a,b),B(a,b)A(a,b),B(a,b) are true for all adjacencies 0a1,0b10a1,0b1 in tk+1t_{k+1}. This completes the proof. ∎

Lemma 5.

Suppose 0a1,0b10a1,0b1 are adjacent in tkt_{k} and that 0c10c1 lies strictly between them in tkt_{k^{\prime}} for some 0<k<k0<k<k^{\prime}. Then 0c1=0b01a0c1=0b01a\cdots.

Proof.

The interval of tkt_{k^{\prime}} between 0a1,0b10a1,0b1 is constructed from 0a1,0b10a1,0b1 in exactly the same way as tkkt_{k^{\prime}-k} was constructed from 0,10,1. Thus 0c1=(0a1)q0b10c1=(0a1)^{q}0b1\cdots for some positive integer qq. Now recall that 0b01a=0a10b0b01a=0a10b by Lemma 1. Thus 0c1=(0a1)q10a10b1=(0a1)q10b01a1=0b(01a)q1=0b01a0c1=(0a1)^{q-1}0a10b1\cdots=(0a1)^{q-1}0b01a1\cdots=0b(01a)^{q}1\cdots=0b01a\cdots as claimed. ∎

Although the existence of a long common prefix for nearby words suggests continuity, to prove anything we must bound the residual after removing the long common prefix. The following Lemma is one way to achieve this.

Lemma 6.

Suppose xy0x\geq y\geq 0, let 0w10w1 be the xx-threshold word and let (01w)ω=su,(10w)ω=su(01w)^{\omega}=su,(10w)^{\omega}=s^{\prime}u^{\prime} where |s|=|s|{\left|s\right|}={\left|s^{\prime}\right|}. Then |S(u,ϕs(y))S(u,ϕs(y))|x+11β.{\left|S(u,\phi_{s}(y))-S(u^{\prime},\phi_{s^{\prime}}(y))\right|}\leq\frac{x+1}{1-\beta}.

Proof.

The highest point on the orbits (ϕ((01w)ω)1:k(x):k0)(\phi_{((01w)^{\omega})_{1:k}}(x):k\geq 0) and (ϕ((10w)ω)1:k(x):k0)(\phi_{((10w)^{\omega})_{1:k}}(x):k\geq 0) is x+1x+1 since 0w10w1 is the xx-threshold word. The terms ak,bka_{k},b_{k} of the discounted sums

S(u,ϕs(y))=:k=0βkakS(u,\phi_{s}(y))=:\sum_{k=0}^{\infty}\beta^{k}a_{k} and S(u,ϕs(y))=:k=0βkbkS(u^{\prime},\phi_{s^{\prime}}(y))=:\sum_{k=0}^{\infty}\beta^{k}b_{k}

are from the orbits (ϕ((01w)ω)1:k(y):k0)(\phi_{((01w)^{\omega})_{1:k}}(y):k\geq 0) and (ϕ((10w)ω)1:k(y):k0)(\phi_{((10w)^{\omega})_{1:k}}(y):k\geq 0) and ϕu′′(x)ϕu′′(y)\phi_{u^{\prime\prime}}(x)\geq\phi_{u^{\prime\prime}}(y) for any word u′′u^{\prime\prime} as xyx\geq y. Therefore terms ak,bka_{k},b_{k}, are also no higher than ϕ0(x)x+1\phi_{0}(x)\leq x+1. Furthermore, terms ak,bka_{k},b_{k} are non-negative, so that |akbk|x+1{\left|a_{k}-b_{k}\right|}\leq x+1. Thus |S(u,ϕs(y))S(u,ϕs(y))|k=0βk|akbk|k=0βk(x+1)=x+11β.{\left|S(u,\phi_{s}(y))-S(u^{\prime},\phi_{s^{\prime}}(y))\right|}\leq\sum_{k=0}^{\infty}\beta^{k}{\left|a_{k}-b_{k}\right|}\leq\sum_{k=0}^{\infty}\beta^{k}(x+1)=\frac{x+1}{1-\beta}.

Although it is clear that λ(π,x)\lambda(\pi,x) is continuous, a bound on its slope is helpful.

Lemma 7.

Suppose x0x\geq 0 and that 0w10w1 is a valid word. Then |λ(0w1,x)|1(1β)2{\left|\lambda^{\prime}(0w1,x)\right|}\leq\frac{1}{(1-\beta)^{2}}.

Proof.

The definition of λ(0w1,x)\lambda(0w1,x) gives

|λ(0w1,x)|1β|0w1|1βk=0βk|ϕ((01w)ω)1:k(x)ϕ((10w)ω)1:k(x)|11βk=0βk=1(1β)2\displaystyle{\left|\lambda^{\prime}(0w1,x)\right|}\leq\frac{1-\beta^{{\left|0w1\right|}}}{1-\beta}\sum_{k=0}^{\infty}\beta^{k}{\left|\phi_{((01w)^{\omega})_{1:k}}^{\prime}(x)-\phi_{((10w)^{\omega})_{1:k}}^{\prime}(x)\right|}\leq\frac{1}{1-\beta}\sum_{k=0}^{\infty}\beta^{k}=\frac{1}{(1-\beta)^{2}}

where the second inequality follows as 0β|0w1|<10\leq\beta^{{\left|0w1\right|}}<1 and 0ϕu(x)10\leq\phi_{u}^{\prime}(x)\leq 1 for any word uu since 0ϕ1(x)ϕ0(x)10\leq\phi_{1}^{\prime}(x)\leq\phi_{0}^{\prime}(x)\leq 1. ∎

We use one more result about ϕ0,ϕ1\phi_{0},\phi_{1} of the main paper.

Lemma 8.

Suppose ϕ0(x)\phi_{0}(x) and ϕ1(x)\phi_{1}(x) are as in the main paper and x+x\in\mathbb{R}_{+}. Then ϕ01(x)<ϕ10(x)\phi_{01}(x)<\phi_{10}(x).

Proof.

The definitions of ϕ0,ϕ1\phi_{0},\phi_{1} give

ϕ10(x)ϕ01(x)=\displaystyle\phi_{10}(x)-\phi_{01}(x)=
(ba)(ab+b+a)x2+(2ab+3b+3a+2)x+ab+2b+2a+3((ab+b+a)x+ab+b+2a+1)((ab+b+a)x+ab+2b+a+1)\displaystyle\quad(b-a)\frac{(ab+b+a)x^{2}+(2ab+3b+3a+2)x+ab+2b+2a+3}{((ab+b+a)x+ab+b+2a+1)((ab+b+a)x+ab+2b+a+1)}

which is positive as b>ab>a and x0x\geq 0. ∎

Our proof of continuity will rely on the standard (ϵ,δ)(\epsilon,\delta) definition in which we will put δ=lk\delta=l_{k} where lkl_{k} is defined in the following Lemma.

Lemma 9.

For any ϵ>0\epsilon>0 there is a k<k<\infty such that 0<lk:=inf{|Xπ|:πtk}<ϵ.0<l_{k}:=\inf\{{\left|X_{\pi}\right|}:\pi\in t_{k}\}<\epsilon.

Proof.

Say 0a1,0b10a1,0b1 are adjacent in tkt_{k}. Then by construction of tk+it_{k+i}, the gap (z10b,z01a)(z_{10b},z_{01a}) contains 2i12^{i}-1 intervals corresponding to words of tk+i\tkt_{k+i}\backslash t_{k}. Each of these intervals is at most z01az10b2i1\frac{z_{01a}-z_{10b}}{2^{i}-1} in length. Thus limklk=0\lim_{k\rightarrow\infty}l_{k}=0. This demonstrates the existence of a k<k<\infty such that lk<ϵl_{k}<\epsilon.

To show that lk>0l_{k}>0 for finite kk, we shall demonstrate that assuming lk=0l_{k}=0 leads to a contradiction. If lk=0l_{k}=0 then there is some word 0w1tk0w1\in t_{k} such that z10w=z01w=:xz_{10w}=z_{01w}=:x. Therefore ϕ10w(x)=ϕ01w(x)\phi_{10w}(x)=\phi_{01w}(x). Now in +\mathbb{R}_{+}, functions ϕ0(x),ϕ1(x)\phi_{0}(x),\phi_{1}(x) have inverses, so ϕw1(x)\phi_{w}^{-1}(x) is well-defined. Therefore

ϕ10(x)=ϕw1ϕ10w(x)=ϕw1ϕ01w(x)=ϕ01(x)\displaystyle\phi_{10}(x)=\phi_{w}^{-1}\circ\phi_{10w}(x)=\phi_{w}^{-1}\circ\phi_{01w}(x)=\phi_{01}(x)

which contradicts Lemma 8 as x0x\geq 0. ∎

8.2 Proof of Continuity

Proof.

We wish to show that for any ϵ>0\epsilon>0, there exists a δ>0\delta>0 such that for any |xy|<δ{\left|x-y\right|}<\delta we have Δ:=|λ(x)λ(y)|<ϵ\Delta:={\left|\lambda(x)-\lambda(y)\right|}<\epsilon. Without loss of generality we assume that xyx\geq y.

Specifically, we shall put δ=lk>0\delta=l_{k}>0 where lkl_{k} is as defined in Lemma 9 and kk is any positive integer such that lk(1β)2<ϵ2\frac{l_{k}}{(1-\beta)^{2}}<\frac{\epsilon}{2} and such that 2x+1(1β)2βk+1<ϵ22\frac{x+1}{(1-\beta)^{2}}\beta^{k+1}<\frac{\epsilon}{2}. The existence of such a kk is guaranteed by Lemma 9 and because β(0,1)\beta\in(0,1).

Let 0a1,0b10a1,0b1 be the xx- and yy-threshold words. If these words are the same then

Δ=|λ(0a1,x)λ(0a1,y)||yx|supz[x,y]|λ(0a1,z)||yx|(1β)2lk(1β)2<ϵ2\displaystyle\Delta={\left|\lambda(0a1,x)-\lambda(0a1,y)\right|}\leq{\left|y-x\right|}\sup_{z\in[x,y]}{\left|\lambda^{\prime}(0a1,z)\right|}\leq\frac{{\left|y-x\right|}}{(1-\beta)^{2}}\leq\frac{l_{k}}{(1-\beta)^{2}}<\frac{\epsilon}{2}

where the second inequality follows from Lemma 7, the third from |yx|<δ=lk{\left|y-x\right|}<\delta=l_{k} and the fourth from the definition of kk.

Otherwise 0a10b10a1\neq 0b1. In this case, let (0e1,0b1)(0e1,0b1) be the standard pair for word 0b10b1, let a¯=ϕ10a(a¯)\underline{a}=\phi_{10a}(\underline{a}) and b¯=ϕ01b(b¯)\bar{b}=\phi_{01b}(\bar{b}). Noting that yb¯a¯xy\leq\bar{b}\leq\underline{a}\leq x, our strategy is to write

Δ\displaystyle\Delta =|Δ1+Δ2+Δ3+Δ4+Δ5+Δ6|\displaystyle={\left|\Delta_{1}+\Delta_{2}+\Delta_{3}+\Delta_{4}+\Delta_{5}+\Delta_{6}\right|}
Δ1\displaystyle\Delta_{1} :=λ(0b1,y)λ(0b1,b¯)\displaystyle:=\lambda(0b1,y)-\lambda(0b1,\bar{b})
Δ2\displaystyle\Delta_{2} :=λ(0b1,b¯)λ(0e1(0b1)ω,b¯)\displaystyle:=\lambda(0b1,\bar{b})-\lambda(0e1(0b1)^{\omega},\bar{b})
Δ3\displaystyle\Delta_{3} :=λ(0e1(0b1)ω,b¯)λ((0a1)ω,b¯)\displaystyle:=\lambda(0e1(0b1)^{\omega},\bar{b})-\lambda((0a1)^{\omega},\bar{b})
Δ4\displaystyle\Delta_{4} :=λ((0a1)ω,b¯)λ((0a1)ω,a¯)\displaystyle:=\lambda((0a1)^{\omega},\bar{b})-\lambda((0a1)^{\omega},\underline{a})
Δ5\displaystyle\Delta_{5} :=λ((0a1)ω,a¯)λ(0a1,a¯)\displaystyle:=\lambda((0a1)^{\omega},\underline{a})-\lambda(0a1,\underline{a})
Δ6\displaystyle\Delta_{6} :=λ(0a1,a¯)λ(0a1,x).\displaystyle:=\lambda(0a1,\underline{a})-\lambda(0a1,x).

Lemma 7 and the choice of δ\delta give

|Δ1|+|Δ4|+|Δ6|b¯y+a¯b¯+xa¯(1β)2<lk(1β)2ϵ2\displaystyle{\left|\Delta_{1}\right|}+{\left|\Delta_{4}\right|}+{\left|\Delta_{6}\right|}\leq\frac{\bar{b}-y+\underline{a}-\bar{b}+x-\underline{a}}{(1-\beta)^{2}}<\frac{l_{k}}{(1-\beta)^{2}}\leq\frac{\epsilon}{2} (20)

while Lemmas 2 and 3 give

Δ2=Δ5=0.\displaystyle\Delta_{2}=\Delta_{5}=0. (21)

It remains to consider Δ3\Delta_{3}. It follows from the definition of lkl_{k}, that for some adjacent words 0c1,0d10c1,0d1 in tkt_{k}: either 0a1=0c10a1=0c1 or 0a10a1 is a word strictly between 0c10c1 and 0d10d1 in the sense of Lemma 5; and that 0e1(0b1)ω0e1(0b1)^{\omega} is a word strictly between 0c10c1 and 0d10d1. Thus by Lemma 5 we have (0a1)ω=0pu(0a1)^{\omega}=0pu and 0e1(0b1)ω=0pv0e1(0b1)^{\omega}=0pv where p:=d01cp:=d01c and u,vu,v are the appropriate suffixes. Therefore the definition of λ(w,x)\lambda(w,x) gives

|Δ3|\displaystyle{\left|\Delta_{3}\right|} =|λ((0a1)ω,b¯)λ(0d1(0b1)ω,b¯)|\displaystyle={\left|\lambda((0a1)^{\omega},\bar{b})-\lambda(0d1(0b1)^{\omega},\bar{b})\right|}
=11β|S(01p,b¯)+β|01p|S(u,ϕ01p(b¯))S(10p,b¯)β|10p|S(u,ϕ10p(b¯))S(01p,b¯)β|01p|S(v,ϕ01p(b¯))+S(10p,b¯)+β|01p|S(v,ϕ10p(b¯))|\displaystyle=\frac{1}{1-\beta}\begin{vmatrix}S(01p,\bar{b})+\beta^{{\left|01p\right|}}S(u,\phi_{01p}(\bar{b}))-S(10p,\bar{b})-\beta^{{\left|10p\right|}}S(u,\phi_{10p}(\bar{b}))\\ -S(01p,\bar{b})-\beta^{{\left|01p\right|}}S(v,\phi_{01p}(\bar{b}))+S(10p,\bar{b})+\beta^{{\left|01p\right|}}S(v,\phi_{10p}(\bar{b}))\end{vmatrix}
=β|01p|1β|S(u,ϕ01p(b¯))S(u,ϕ10p(b¯))S(v,ϕ01p(b¯))+S(v,ϕ10p(b¯))|\displaystyle=\frac{\beta^{{\left|01p\right|}}}{1-\beta}{\left|S(u,\phi_{01p}(\bar{b}))-S(u,\phi_{10p}(\bar{b}))-S(v,\phi_{01p}(\bar{b}))+S(v,\phi_{10p}(\bar{b}))\right|}
β|01p|1β(|S(u,ϕ01p(b¯))S(u,ϕ10p(b¯))|+|S(v,ϕ01p(b¯))S(v,ϕ10p(b¯))|)\displaystyle\leq\frac{\beta^{{\left|01p\right|}}}{1-\beta}\left({\left|S(u,\phi_{01p}(\bar{b}))-S(u,\phi_{10p}(\bar{b}))\right|}+{\left|S(v,\phi_{01p}(\bar{b}))-S(v,\phi_{10p}(\bar{b}))\right|}\right)
β|01p|1β(a¯+11β+b¯+11β)\displaystyle\leq\frac{\beta^{{\left|01p\right|}}}{1-\beta}\left(\frac{\underline{a}+1}{1-\beta}+\frac{\bar{b}+1}{1-\beta}\right)
βk+1(1β)22(x+1)\displaystyle\leq\frac{\beta^{k+1}}{(1-\beta)^{2}}2(x+1)
<ϵ2\displaystyle<\frac{\epsilon}{2} (22)

where the last four inequalities follow from the triangle inequality, from Lemma 6, from equation 16 coupled with the fact that a¯b¯x\underline{a}\leq\bar{b}\leq x and finally from the definition of kk.

Finally, coupling 2021 and 22 and using the triangle inequality gives

Δ\displaystyle\Delta <ϵ2+0+ϵ2=ϵ.\displaystyle<\frac{\epsilon}{2}+0+\frac{\epsilon}{2}=\epsilon.

This completes the proof. ∎

9 Properties of the Linear-System Orbits M(w)M(w)

Recall the definitions about words from the main paper, particularly that w~\tilde{w} is the reverse of ww. Also, recall the definitions of matrices F,G,K,M(w)F,G,K,M(w). The first of the following propositions is used to prove the second. The second appears in the main paper.

Proposition 18.

Suppose w,ww,w^{\prime} are any words. Then

  1. 1.

    det(M(w))=1,\det(M(w))=1,

  2. 2.

    M(w~)=KM(w)1KM(\tilde{w})=KM(w)^{-1}K,

  3. 3.

    M(w)=(efeh1fh)M(w)=\begin{pmatrix}e&f\\ \frac{eh-1}{f}&h\end{pmatrix} for some e,f,he,f,h\in\mathbb{R},

  4. 4.

    M(w)M(w~)=λKM(w)-M(\tilde{w})=\lambda K for some λ\lambda\in\mathbb{R},

  5. 5.

    [M(w01w)]22[M(w01w)]21[M(w10w)]22[M(w10w)]21\displaystyle\frac{[M(w01w^{\prime})]_{22}}{[M(w01w^{\prime})]_{21}}\geq\frac{[M(w10w^{\prime})]_{22}}{[M(w10w^{\prime})]_{21}},

  6. 6.

    [M(w)]22[M(w)]21[M(w)]_{22}\geq[M(w)]_{21}.

Proof.

det(M(w))=i=1|w|det(M(wi))=1\det(M(w))=\prod_{i=1}^{{\left|w\right|}}\det(M(w_{i}))=1 as det(F)=det(G)=1\det(F)=\det(G)=1 gives Claim 1.
Claim 2. The definitions of F,G,KF,G,K give KF=F1K,KG=G1KKF=F^{-1}K,KG=G^{-1}K. Thus KM(w)=M(w|w|)1M(w1)1K=M(w~)1KKM(w)=M(w_{{\left|w\right|}})^{-1}\cdots M(w_{1})^{-1}K=M(\tilde{w})^{-1}K. The result follows as K2=IK^{2}=I.
Claim 3. Put M(w)=:(efgh)M(w)=:\begin{pmatrix}e&f\\ g&h\end{pmatrix} and solve det(M(w))=1=eghf\det(M(w))=1=eg-hf for gg.
Claim 4. Substituting Claim 2 and Claim 3 in Claim 4 gives M(w)KM(w)1K=(heg)K.M(w)-KM(w)^{-1}K=(h-e-g)K.
Claim 5. Put M:=M(w),N:=M(w)M:=M(w),N:=M(w^{\prime}). We calculate

[NGFM]22[NFGM]21[NGFM]21[NFGM]22\displaystyle[NGFM]_{22}[NFGM]_{21}-[NGFM]_{21}[NFGM]_{22}
=(ba)(M11M22M12M21)((ab+b+a)N222+(b+a+2)N21N22+N212)0\displaystyle\quad=(b-a)(M_{11}M_{22}-M_{12}M_{21})((ab+b+a)N_{22}^{2}+(b+a+2)N_{21}N_{22}+N_{21}^{2})\geq 0

as b>a0b>a\geq 0, det(M)=1\det(M)=1 and N0N\geq 0. The result follows as NFGM0NFGM\geq 0 and NGFM0NGFM\geq 0.
Claim 6.  If w=ϵw=\epsilon then [M(w)]22[M(w)]21=10[M(w)]_{22}-[M(w)]_{21}=1\geq 0. Otherwise we use induction on |w|{\left|w\right|} to show that M(w)v0M(w)v\geq 0 where v:=(1,1)Tv:=(-1,1)^{T}. In the base case w{0,1}w\in\{0,1\} so

M(w)v=(11c1+c)(11)=(01)0for some c{a,b}.\displaystyle M(w)v=\begin{pmatrix}1&1\\ c&1+c\end{pmatrix}\begin{pmatrix}-1\\ 1\end{pmatrix}=\begin{pmatrix}0\\ 1\end{pmatrix}\geq 0\qquad\text{for some $c\in\{a,b\}$}.

For the inductive step, assume w={0u,1u}w=\{0u,1u\} for some word uu satisfying M(u)v0M(u)v\geq 0. Then

M(w)v=(11c1+c)M(u)v0for some c{a,b}.\displaystyle M(w)v=\begin{pmatrix}1&1\\ c&1+c\end{pmatrix}M(u)v\geq 0\qquad\text{for some $c\in\{a,b\}$.}

As [M(w)v]2=[M(w)]22[M(w)]21[M(w)v]_{2}=[M(w)]_{22}-[M(w)]_{21}, this completes the proof. ∎

Proposition 19.

Suppose ww is a word, pp is a palindrome and n+n\geq\mathbb{Z}_{+}. Then

  1. 1.

    M(p)=(fh+1h+ffh21h+fh)M(p)=\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix} for some f,hf,h\in\mathbb{R},

  2. 2.

    tr(M(10p))=tr(M(01p))\text{tr}(M(10p))=\text{tr}(M(01p)),

  3. 3.

    If u{p(10p)n,(10p)n10}u\in\{p(10p)^{n},(10p)^{n}10\} then M(u)M(u~)=λKM(u)-M(\tilde{u})=\lambda K for some λ\lambda\in\mathbb{R}_{-},

  4. 4.

    If ww is a prefix of pp then [M(p(10p)n10w)]22[M(p(01p)n01w)]22[M(p(10p)^{n}10w)]_{22}\leq[M(p(01p)^{n}01w)]_{22},

  5. 5.

    [M((10p)n10w)]21[M((01p)n01w)]21[M((10p)^{n}10w)]_{21}\geq[M((01p)^{n}01w)]_{21},

  6. 6.

    [M((10p)n1)]21[M((01p)n0)]21[M((10p)^{n}1)]_{21}\geq[M((01p)^{n}0)]_{21}.

Proof.

In this proof, we refer to Claim kk of Proposition 18 as Pkk.
Claim 1. P2 gives M(p)=KM(p)1KM(p)=KM(p)^{-1}K as p=p~p=\tilde{p}. But in the notation of P3, [M(p)]11=[KM(p)1K]11[M(p)]_{11}=[KM(p)^{-1}K]_{11} says e=h(eh1)/fe=h-(eh-1)/f. Solve this for ee and substitute in P3.
Claim 2. Noting that GFFG=(ba)KGF-FG=(b-a)K, the notation of Claim 1 gives

tr(M(01p))tr(M(10p))=tr(M(p)(GFGF))=(ba)tr((fh+1h+ffh21h+fh)K)=0.\displaystyle\text{tr}(M(01p))-\text{tr}(M(10p))=\text{tr}(M(p)(GF-GF))=(b-a)\text{tr}\left(\begin{pmatrix}\frac{fh+1}{h+f}&f\\ \frac{h^{2}-1}{h+f}&h\end{pmatrix}K\right)=0.

Claim 3. Note we can move from uu to u~\tilde{u} just by swapping some 1010 for 0101. So, repeated application of P5 gives the inequality [M(u)]22[M(u)]21[M(u~)]22[M(u~)]21.\frac{[M(u)]_{22}}{[M(u)]_{21}}\leq\frac{[M(\tilde{u})]_{22}}{[M(\tilde{u})]_{21}}. But the denominators of this inequality are equal (and non-negative) as P4 gives [M(u)]21[M(u~)]21=λK21=0[M(u)]_{21}-[M(\tilde{u})]_{21}=\lambda^{\prime}K_{21}=0 for some λ\lambda^{\prime}\in\mathbb{R}. Thus this inequality reduces to [M(u)]22[M(u~)]22[M(u)]_{22}\leq[M(\tilde{u})]_{22}. Yet P4 also gives [M(u)M(u~)]22=λK22[M(u)-M(\tilde{u})]_{22}=\lambda K_{22} which combined with the previous sentence says that λK220\lambda K_{22}\leq 0. As K22=1K_{22}=1, this gives λ\lambda\in\mathbb{R}_{-}.
Claim 4. Let ss be the corresponding suffix so p=wsp=ws and

M(p(10p)n10w)M(p(01p)n01w)=M(s)1(M(p(10p)n+1)M(p(01p)n+1))=:A.M(p(10p)^{n}10w)-M(p(01p)^{n}01w)=M(s)^{-1}(M(p(10p)^{n+1})-M(p(01p)^{n+1}))=:A.

But Claim 3 with u=p(10p)n+1u=p(10p)^{n+1} gives

[A]22\displaystyle[A]_{22} =λ[M(s)1K]22for some λ0=[KM(s~)]22by P2=λ([M(s~)]22[M(s~)]21)0by P6.\displaystyle=\underbrace{\lambda[M(s)^{-1}K]_{22}}_{\text{for some $\lambda\leq 0$}}=\underbrace{[KM(\tilde{s})]_{22}}_{\text{by P2}}=\lambda([M(\tilde{s})]_{22}-[M(\tilde{s})]_{21})\leq{\underbrace{0}}_{\text{by P6.}}

Claim 5. As M(w)0M(w)\geq 0, Claim 3 with u=(10p)n10u=(10p)^{n}10 gives

[M(w)(M((10p)n10)M((01p)n01))]21=λ[M(w)K]21=λ[M(w)]210.\displaystyle[M(w)(M((10p)^{n}10)-M((01p)^{n}01))]_{21}=\lambda[M(w)K]_{21}=\lambda[-M(w)]_{21}\geq 0.

Claim 6. Let E:=(0011).E:=\begin{pmatrix}0&0\\ 1&1\end{pmatrix}. Then GF=(ba)E0G-F=(b-a)E\geq 0, so that

[GM((10p)n)FM((01p)n)]21\displaystyle[GM((10p)^{n})-FM((01p)^{n})]_{21} =[(ba)EM((10p)n)+FM((10p)n)FM((01p)n)]21\displaystyle=[(b-a)EM((10p)^{n})+FM((10p)^{n})-FM((01p)^{n})]_{21}
[M((10p)n0)M((01p)n0)]210\displaystyle\geq[M((10p)^{n}0)-M((01p)^{n}0)]_{21}\geq 0

by Claim 5. This completes the proof. ∎

10 Majorisation

In the main paper, we used one result about majorisation which was similar-but-not-identical to any results in Marshall, Olson and Arnold (2011). Let us prove that result.

Proposition 20.

Suppose x,y+mx,y\in\mathbb{R}_{+}^{m} and f:f:\mathbb{R}\rightarrow\mathbb{R} is a symmetric function that is convex and decreasing on +\mathbb{R}_{+}. Then xwy and β[0,1]i=1mβif(x(i))i=1mβif(y(i))\text{$x\prec^{w}y$ and $\beta\in[0,1]$}\quad\Rightarrow\quad\sum_{i=1}^{m}\beta^{i}f(x_{(i)})\geq\sum_{i=1}^{m}\beta^{i}f(y_{(i)}).

Proof.

As the claim relates to x(i)x_{(i)} and y(i)y_{(i)} we assume that xix_{i} and yiy_{i} are in ascending order.

Marshall et al (3H2B, page 133) says that if g:𝒜g:\mathcal{A}\rightarrow\mathbb{R} is a non-decreasing and convex function on 𝒜\mathcal{A}\subseteq\mathbb{R} and (u1,,um)(u_{1},\dots,u_{m}) is a non-increasing and non-negative sequence, then for all non-increasing sequences (p1,,pm)(p_{1},\dots,p_{m}) the function ϕ(a):=i=1muig(pi)\phi(a):=\sum_{i=1}^{m}u_{i}g(p_{i}) is Schur-convex.

Indeed the function ff is increasing and convex for pp\in\mathbb{R}_{-} (such as p=xp=-x and p=yp=-y) and (β,,βm)(\beta,\dots,\beta^{m}) is a non-increasing and non-negative sequence for β[0,1]\beta\in[0,1]. Thus for all non-increasing sequences (p1,,pm)(p_{1},\dots,p_{m}) on m\mathbb{R}_{-}^{m} the function ψ(p):=i=1mβif(pi)\psi(p):=\sum_{i=1}^{m}\beta^{i}f(p_{i}) is Schur-convex.

Recall (ibid, page 12) that ama\in\mathbb{R}^{m} is said to be weakly submajorised by bmb\in\mathbb{R}^{m}, written awba\prec_{w}b if

i=1ka[i]i=1kb[i],k=1,,mwhere a[i] denotes a in descending order\displaystyle\sum_{i=1}^{k}a_{[i]}\leq\sum_{i=1}^{k}b_{[i]},\quad k=1,\dots,m\qquad\text{where $a_{[i]}$ denotes $a$ in descending order}

and that xwyawbx\prec_{w}y\Leftrightarrow-a\prec^{w}-b (ibid, page 13).

However (ibid, 3A8, page 87) if ϕ(p)\phi(p) is a real function on 𝒜m\mathcal{A}\subset\mathbb{R}^{m} which is non-decreasing in each argument pip_{i} and Schur-convex on 𝒜\mathcal{A} and pwqp\prec_{w}q on 𝒜\mathcal{A} then ϕ(p)ϕ(q)\phi(p)\leq\phi(q).

Indeed, the function ψ(p)=i=1mβif(pi)\psi(p)=\sum_{i=1}^{m}\beta^{i}f(p_{i}) is a real function on m\mathbb{R}_{-}^{m} which is non-decreasing in each argument and Schur-convex on m\mathbb{R}_{-}^{m} for all non-increasing sequences (p1,,pm)(p_{1},\dots,p_{m}). Furthermore, ywx-y\prec_{w}-x as xwyx\prec^{w}y. Therefore ψ(y)=ψ(y)ψ(x)=ψ(x)\psi(y)=\psi(-y)\leq\psi(-x)=\psi(x) as claimed. ∎

11 Clarification of Theorem 1 for 0xy10\leq x\leq y_{1} or y0x<y_{0}\leq x<\infty

Recall the following definitions and assumption from the main paper

F\displaystyle F :=(11a1+a),\displaystyle:=\begin{pmatrix}1&1\\ a&1+a\end{pmatrix}, G\displaystyle G :=(11b1+b),\displaystyle:=\begin{pmatrix}1&1\\ b&1+b\end{pmatrix}, E\displaystyle E :=(0011),\displaystyle:=\begin{pmatrix}0&0\\ 1&1\end{pmatrix}, v(x)\displaystyle v(x) :=(z1),\displaystyle:=\begin{pmatrix}z\\ 1\end{pmatrix}, b>a0.\displaystyle b>a\geq 0.

If 0xy10\leq x\leq y_{1} or y0x<y_{0}\leq x<\infty then the relevant linear systems, (9) in the main paper, are

(M(1k+1)M(01k))v(x)=(GF)Gkv(x)=(ba)EGkv(x)0(M(10k)M(0k+1))v(x)=(GF)Fkv(x)=(ba)EFkv(x)0}for kZ+\displaystyle\left.\begin{aligned} (M(1^{k+1})-M(01^{k}))v(x)&=(G-F)G^{k}v(x)=(b-a)EG^{k}v(x)\geq 0\\ (M(10^{k})-M(0^{k+1}))v(x)&=(G-F)F^{k}v(x)=(b-a)EF^{k}v(x)\geq 0\end{aligned}\right\}\quad\text{for $k\in Z_{+}$}

where both inequalities follow as E,F,GE,F,G are all 0\geq 0, as b>ab>a and as xmin{y0,y1}0.x\geq\min\{y_{0},y_{1}\}\geq 0. Therefore all cumulative sums of the above expressions are non-negative so the derivative of the numerator of the Whittle index is non-negative by the same weak-supermajorisation argument as in the main paper.

Meanwhile, the denominator of the index in these cases is

k=0βk((1ω)k+1(01ω)k+1)=β=k=0βk((10ω)k+1(0ω)k+1)\displaystyle\sum_{k=0}^{\infty}\beta^{k}((1^{\omega})_{k+1}-(01^{\omega})_{k+1})=\beta=\sum_{k=0}^{\infty}\beta^{k}((10^{\omega})_{k+1}-(0^{\omega})_{k+1})

which is non-negative. Therefore the rest of the proof of Theorem 1 follows as in the main paper.

In fact we could say that the majorisation point, which is ϕw(0)\phi_{w}(0) for words 0w10w1 in the main paper, is 1-1 in both cases. Indeed, Claim 6 of Proposition 4 of the main paper says that Fv(1)=Gv(1)=v(0)Fv(-1)=Gv(-1)=v(0). Also, Ev(1)=(0,0)T.Ev(-1)=(0,0)^{T}. Thus for all k+k\in\mathbb{Z}_{+}, EGkv(1)EFkv(1)0EG^{k}v(-1)\geq EF^{k}v(-1)\geq 0 whereas Ev(1ϵ)<0Ev(-1-\epsilon)<0 for any ϵ>0.\epsilon>0.