When are Kalman-Filter Restless Bandits Indexable?
Abstract
We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the Whittle index is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about Schur-convexity and mechanical words, which are particular binary strings intimately related to palindromes.
1 Introduction
We study the problem of monitoring several time series so as to maintain a precise belief while minimising the cost of sensing. Such problems can be viewed as POMDPs with belief-dependent rewards [3] and their applications include active sensing [7], attention mechanisms for multiple-object tracking [22], as well as online summarisation of massive data from time-series [4]. Specifically, we discuss the restless bandit [24] associated with the discrete-time Kalman filter [19].
Restless bandits generalise bandit problems [6, 8] to situations where the state of each arm (project, site or target) continues to change even if the arm is not played. As with bandit problems, the states of the arms evolve independently given the actions taken, suggesting that there might be efficient algorithms for large-scale settings, based on calculating an index for each arm, which is a real number associated with the (belief-)state of that arm alone. However, while bandits always have an optimal index policy (select the arm with the largest index), it is known that no index policy can be optimal for some discrete-state restless bandits [17] and such problems are in general PSPACE-hard even to approximate to any non-trivial factor [10]. Further, in this paper we address restless bandits with real-valued rather than discrete states.
On the other hand, Whittle proposed a natural index policy for restless bandits [24], but this policy only makes sense when the restless bandit is indexable, as we now explain. Say we have restless bandits and we are constrained to play arms at each time. Whittle considered relaxing this constraint by only requiring that the time-average number of arms played is . Now the optimal average cost for this relaxed problem is a lower bound on the optimal average cost for the original problem. Also, the relaxed problem can be separated into single-arm problems by the method of Lagrange multipliers, making it relatively easy to solve. In this separated version of the relaxed problem, each arm behaves identically to an arm in the original problem, except that an additional price is charged each time the arm is played, where corresponds to the Lagrange multiplier for the relaxed constraint. Now let us consider a family of optimal policies which achieves the optimal cost-to-go for a single arm with price and which takes actions when in state where means passive and means active. At first glance, we might intuitively suppose that it becomes less and less attractive to be active as the price increases so that as the price is increased beyond some value , the optimal action switches from active to passive. At this price we are ambivalent between being active and passive so that . Such a value is called the Whittle index for arm in state . Indeed if there is a family of optimal policies for which
then an optimal solution to the relaxed problem for price is to activate arm if and only if . If a restless bandit satisfies this condition, it is said to be indexable. It is important to note that some restless bandits are not indexable, so activating arm if and only if does not correspond to an optimal solution to the relaxed problem. Indeed, in a study of small randomly-generated problems, Weber and Weiss [23] found that roughly 10% of problems were not indexable.
As a policy based on is so good for the relaxed problem when the arms are indexable, this motivates us to use as a heuristic for the original problem. This heuristic is called Whittle’s index policy and at each time it activates the arms with the highest indexes . Further motivation for studying indexability is that for ordinary bandits the Whittle index reduces to the Gittins index, making the Whittle index policy optimal when only one arm may be active at each time, that is when . More generally, Whittle’s index policy is not optimal for some restless bandit problems even when the arms are indexable, but indexability is still a rather useful concept, since if all arms are indexable and certain other conditions hold, Whittle’s policy is asymptotically optimal, as we now explain. Consider a sequence of restless bandit problems parameterised by the number of indexable arms and in which of the arms can be simultaneously active for some fixed . Then as tends to infinity, the time-average cost per arm for Whittle’s index policy converges to the time-average cost per arm for an optimal policy, provided a certain fluid approximation has a unique fixed point. This result was first demonstrated by Weber and Weiss [23] who for simplicity of exposition only considered the symmetric case in which the arms have identical costs and transition probabilities. Recently, Verloop [20] extended this result to asymmetric cases involving multiple types of arms. Interestingly, this extension also covers cases where new arms arrive and old arms depart.
Restless bandits associated with scalar Kalman(-Bucy) filters in continuous time were recently shown to be indexable [12] and the corresponding discrete-time problem has attracted considerable attention over a long period [15, 11, 16, 21]. However, that attention has produced no satisfactory proof of indexability – even for scalar time-series and even if we assume that there is a monotone optimal policy for the single-arm problem, which is a policy that plays the arm if and only if the relevant belief-state exceeds some threshold (here the relevant belief-state is a posterior variance). Theorem 1 of this paper addresses that gap. After formalising the problem (Section 2), we describe the concepts and intuition (Section 3) behind the main result (Section 4). The main tools are mechanical words (which are not sufficiently well-known) and Schur convexity. As these tools are associated with rather general theorems, we believe that future work (Section 5) should enable substantial generalisation of our results.
2 Problem and Index
We consider the problem of tracking time-series, which we call arms, in discrete time. The state of arm at time evolves as a standard-normal random walk independent of everything but its immediate past ( and all include zero). The action space is . Action makes an expensive observation of arm which is normally-distributed about with precision and we receive cheap observations of each other arm with precision where and means no observation at all.
Let be the state, observation, history and observed history, so that and Then we formalise the above as ( is the indicator function)
Note that this setting is readily generalised to by a change of variables.
Thus the posterior belief is given by the Kalman filter as where the posterior mean is and the error variance satisfies
(1) |
Problem KF1. Let be a policy so that . Let be the error variance under . The problem is to choose so as to minimise the following objective for discount factor . The objective consists of a weighted sum of error variances with weights plus observation costs for :
where the equality follows as (1) is a deterministic mapping (and assuming is deterministic).
Single-Arm Problem and Whittle Index. Now fix an arm and write instead of . Say there are now two actions corresponding to cheap and expensive observations respectively and the expensive observation now costs where . The single-arm problem is to choose a policy, which here is an action sequence,
(2) |
Let be the optimal cost-to-go in this problem if the first action must be and let be an optimal policy, so that
For any fixed , the value of for which actions and are both optimal is known as the Whittle index assuming it exists and is unique. In other words
The Whittle index is the solution to | (3) |
Let us consider a policy which takes action then acts optimally producing actions and error variances . Then (3) gives
Solving this linear equation for the index gives
(4) |
Whittle [24] recognised that for his index policy (play the arm with the largest ) to make sense, any arm which receives an expensive observation for added cost , must also receive an expensive observation for added cost . Such problems are said to be indexable. The question resolved by this paper is whether Problem KF1 is indexable. Equivalently, is non-decreasing in ?
3 Main Result, Key Concepts and Intuition
We make the following intuitive assumption about threshold (monotone) policies.
A1. For some depending on , the policy is optimal for problem (2).
Note that under A1, definition (3) means the policy is also optimal, so we can choose
(5) |
where . We refer to as the -threshold orbits (Figure 1).
We are now ready to state our main result.
Theorem 1. Suppose a threshold policy (A1) is optimal for the single-arm problem (2). Then Problem KF1 is indexable. Specifically, for any let
and for any and , let
(6) |
in which action sequences and error variance sequences are given in terms of by (5). Then is a continuous and non-decreasing function of .

We are now ready to describe the key concepts underlying this result.
Words. In this paper, a word is a string on with letter and . The empty word is , the concatenation of words is , the word that is the -fold repetition of is , the infinite repetition of is and is the reverse of , so means is a palindrome. The length of is and is the number of times that word appears in , overlaps included.
Christoffel, Sturmian and Mechanical Words. It turns out that the action sequences in (5) are given by such words, so the following definitions are central to this paper.
The Christoffel tree (Figure 2) is an infinite complete binary tree [5] in which each node is labelled with a pair of words. The root is and the children of are and . The Christoffel words are the words and the concatenations for all in that tree. The fractions form the Stern-Brocot tree [9] which contains each positive rational number exactly once. Also, infinite paths in the Stern-Brocot tree converge to the positive irrational numbers. Analogously, Sturmian words could be thought of as infinitely-long Christoffel words.
Alternatively, among many known characterisations, the Christoffel words can be defined as the words and the words where and
for any relatively prime natural numbers and and for . The Sturmian words are then the infinite words where, for and ,
We use the notation for Sturmian words although they are infinite.

The set of mechanical words is the union of the Christoffel and Sturmian words [13]. (Note that the mechanical words are sometimes defined in terms of infinite repetitions of the Christoffel words.)
Majorisation. As in [14], let and let and be their elements sorted in ascending order. We say is weakly supermajorised by and write if
If this is an equality for we say is majorised by and write . It turns out that
where are the sequences sorted in descending order. For we have [14] | ||||
More generally, a real-valued function defined on a subset of is said to be Schur-convex on if implies that .
Möbius Transformations. Let denote the Möbius transformation where . Möbius transformations such as are closed under composition, so for any word we define and
Intuition. Here is the intuition behind our main result.
For any , the orbits in (5) correspond to a particular mechanical word or depending on the value of (Figure 1). Specifically, for any word , let be the fixed point of the mapping on so that and . Then the word corresponding to is 1 for , for and 0 for . In passing we note that these fixed points are sorted in ascending order by the ratio of counts of 0s to counts of 1s, as illustrated by Figure 3. Interestingly, it turns out that ratio is a piecewise-constant yet continuous function of , reminiscent of the Cantor function.
Also, composition of Möbius transformations is homeomorphic to matrix multiplication so that
Thus, the index (6) can be written in terms of the orbits of a linear system (11) given by or Further, if and then the gradient of the corresponding Möbius transformation is the convex function
So the gradient of the index is the difference of the sums of a convex function of the linear-system orbits. However, such sums are Schur-convex functions and it follows that the index is increasing because one orbit weakly supermajorises the other, as we now show for the case (noting that the proof is easier for words ). As is a mechanical word, is a palindrome. Further, if is a palindrome, it turns out that the difference between the linear-system orbits increases with . So, we might define the majorisation point for as the for which one orbit majorises the other. Quite remarkably, if is a palindrome then the majorisation point is (Proposition 7). Indeed the black circles and blue dots of Figure 3 coincide. Finally, is less than or equal to which is the least for which the orbits correspond to the word . Indeed, the blue dots of Figure 3 are below the corresponding black dots. Thus one orbit does indeed supermajorise the other.

4 Proof of Main Result
4.1 Mechanical Words
The Möbius transformations of (1) satisfy the following assumption for . We prove that the fixed point of word (the solution to on ) is unique in the supplementary material.
Assumption A2. Functions , where is an interval of , are increasing and non-expansive, so for all and for we have
Furthermore, the fixed points of on satisfy .
Hence the following two propositions (supplementary material) apply to of (1) on .
Proposition 1.
Suppose A2 holds, and is a non-empty word. Then
and |
For a given , in the notation of (5), we call the shortest word such that the -threshold word. Proposition 2 generalises a recent result about -threshold words in a setting where are linear [18].
Proposition 2.
Suppose A2 holds and is a mechanical word. Then
Also, if with and then the - and -threshold words are and .
We also use the following very interesting fact (Proposition 4.2 on p.28 of [5]).
Proposition 3.
Suppose is a mechanical word. Then is a palindrome.
4.2 Properties of the Linear-System Orbits and Prefix Sums
Definition. Assume that and . Consider the matrices
so that the Möbius transformations are the functions of (1) and . Given any word , we define the matrix product
where is the identity and the prefix sum as the matrix polynomial
(7) |
For any , let be the trace of , let be the entries of and let indicate that all entries of are non-negative.
Remark. Clearly, so that for any word . Also, corresponds to the partial sums of the linear-system orbits, as hinted in the previous section.
The following proposition captures the role of palindromes (proof in the supplementary material).
Proposition 4.
Suppose is a word, is a palindrome and . Then
-
1.
for some ,
-
2.
,
-
3.
If then for some ,
-
4.
If is a prefix of then ,
-
5.
,
-
6.
.
We now demonstrate a surprisingly simple relation between and .
Proposition 5.
Suppose is a palindrome. Then
(8) |
Furthermore, if then
(9) |
Proof.
Let us write . We prove (8) by induction on . In the base case . For , For , for some . For the inductive step, in accordance with Claim 1 of Proposition 19, assume for some word satisfying
For , and . Calculating the corresponding matrix products and sums gives
as claimed. For the claim also holds as . This completes the proof of (8).
Furthermore Part. Let and . Then
(10) |
by definition of . By Claim 1 of Proposition 19 and (8) we know that
Substituting these expressions and the definitions of into the definitions of and then into (10) for directly gives (although this calculation is long).
Now consider the case . Claim 2 of Proposition 19 says and clearly . Thus we can diagonalise as
so that So, if then and we already showed that . Otherwise , so implies which gives . Thus for any we have . ∎
4.3 Majorisation
The following is a straightforward consequence of results in [14] proved in the supplementary material. We emphasize that the notation has nothing to do with the notion of as a word.
Proposition 6.
Suppose and is a symmetric function that is convex and decreasing on . Then .
For any and any fixed word , define the sequences for and
(11) |
where and
Proposition 7.
Suppose is a palindrome and . Then and are ascending sequences on and for any .
Proof.
Clearly so and hence . So for any word and letter we have as . Thus and . In conclusion, and are ascending sequences on .
4.4 Indexability
Theorem 1.
The index of (6) is continuous and non-decreasing for .
Proof.
As weight is non-negative and cost is a constant we only need to prove the result for and we can use to denote a word. By Proposition 2, for some mechanical word . (Cases are clarified in the supplementary material.)
Let us show that the hypotheses of Proposition 7 are satisfied by and . Firstly, is a palindrome by Proposition 3. Secondly, and as is monotonically increasing, it follows that . Equivalently, so that by Proposition 1. Hence .
Thus Proposition 7 applies, showing that the sequences and , with elements and as defined in (11), are non-decreasing sequences on with . Also, is a symmetric function that is convex and decreasing on . Therefore Proposition 6 applies giving
(12) |
Also Proposition 2 shows that the -threshold orbits are and where and . So the denominator of (6) is
Note that for any . Then (12) gives
But is continuous for (as shown in the supplementary material). Therefore we conclude that is non-decreasing for . ∎
5 Further Work
One might attempt to prove that assumption A1 holds using general results about monotone optimal policies for two-action MDPs based on submodularity [2] or multimodularity [1]. However, we find counter-examples to the required submodularity condition. Rather, we are optimistic that the ideas of this paper themselves offer an alternative approach to proving A1. It would then be natural to extend our results to settings where the underlying state evolves as for some multiplier and to cost functions other than the variance. Finally, the question of the indexability of the discrete-time Kalman filter in multiple dimensions remains open.
References
- [1] E. Altman, B. Gaujal, and A. Hordijk. Multimodularity, convexity, and optimization properties. Mathematics of Operations Research, 25(2):324–347, 2000.
- [2] E. Altman and S. Stidham Jr. Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Systems, 21(3-4):267–291, 1995.
- [3] M. Araya, O. Buffet, V. Thomas, and F. Charpillet. A POMDP extension with belief-dependent rewards. In Neural Information Processing Systems, pages 64–72, 2010.
- [4] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 671–680, 2014.
- [5] J. Berstel, A. Lauve, C. Reutenauer, and F. Saliola. Combinatorics on Words: Christoffel Words and Repetitions in Words. CRM Monograph Series, 2008.
- [6] S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Foundation and Trends in Machine Learning, Vol. 5. NOW, 2012.
- [7] Y. Chen, H. Shioi, C. Montesinos, L. P. Koh, S. Wich, and A. Krause. Active detection via adaptive submodularity. In Proceedings of The 31st International Conference on Machine Learning, pages 55–63, 2014.
- [8] J. Gittins, K. Glazebrook, and R. Weber. Multi-armed bandit allocation indices. John Wiley & Sons, 2011.
- [9] R. Graham, D. Knuth, and O. Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley, 1994.
- [10] S. Guha, K. Munagala, and P. Shi. Approximation algorithms for restless bandit problems. Journal of the ACM, 58(1):3, 2010.
- [11] B. La Scala and B. Moran. Optimal target tracking with restless bandits. Digital Signal Processing, 16(5):479–487, 2006.
- [12] J. Le Ny, E. Feron, and M. Dahleh. Scheduling continuous-time Kalman filters. IEEE Trans. Automatic Control, 56(6):1381–1394, 2011.
- [13] M. Lothaire. Algebraic combinatorics on words. Cambridge University Press, 2002.
- [14] A. Marshall, I. Olkin, and B. Arnold. Inequalities: Theory of majorization and its applications. Springer Science & Business Media, 2010.
- [15] L. Meier, J. Peschon, and R. Dressler. Optimal control of measurement subsystems. IEEE Trans. Automatic Control, 12(5):528–536, 1967.
- [16] J. Niño-Mora and S. Villar. Multitarget tracking via restless bandit marginal productivity indices and Kalman filter in discrete time. In Proceedings of the 48th IEEE Conference on Decision and Control, pages 2905–2910, 2009.
- [17] R. Ortner, D. Ryabko, P. Auer, and R. Munos. Regret bounds for restless Markov bandits. In Algorithmic Learning Theory, pages 214–228. Springer, 2012.
- [18] B. Rajpathak, H. Pillai, and S. Bandyopadhyay. Analysis of stable periodic orbits in the one dimensional linear piecewise-smooth discontinuous map. Chaos, 22(3):033126, 2012.
- [19] T. Thiele. Sur la compensation de quelques erreurs quasi-systématiques par la méthode des moindres carrés. CA Reitzel, 1880.
- [20] I. Verloop. Asymptotic optimal control of multi-clss restless bandits. CNRS Technical Report, hal-00743781, 2014.
- [21] S. Villar. Restless bandit index policies for dynamic sensor scheduling optimization. PhD thesis, Statistics Department, Universidad Carlos III de Madrid, 2012.
- [22] E. Vul, G. Alvarez, J. B. Tenenbaum, and M. J. Black. Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model. In Neural Information Processing Systems, pages 1955–1963, 2009.
- [23] R. R. Weber and G. Weiss. On an index policy for restless bandits. Journal of Applied Probability, pages 637–648, 1990.
- [24] P. Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, pages 287–298, 1988.
6 Supplementary Material: Introduction
The results used but not proved in the main paper are given here as:
-
•
Proposition 9 which was used to show that ,
-
•
Proposition 16 for the range of giving a specific mechanical word,
-
•
Proposition 17 showing the index is continuous for
-
•
Proposition 19 showing the properties of when is a palindrome.
-
•
and Proposition 20 for weak supermajorisation with .
A clarification of the extreme cases of Theorem 1 of the main paper is presented in the final section.
7 From -Threshold Policies to Mechanical Words
Some concepts relating to mechanical words appeared as early as 1771 in Jean Bernoulli’s study of continued fractions (Berstel et al, 2008). The term “mechanical sequences” appears in the work of Morse and Hedlund (Am. J. Math., Vol 62, No. 1, 1940, p. 1-42) who had just introduced the term “symbolic dynamics”. Morse and Hedlund studied the concept from the perspective of sequences of the form for and . They also studied the concept from the perspective of differential equations, motivating the term “Sturmian sequences.” Since that time there has been tremendous progress in the study of such sequences from the perspective of Combinatorics on Words (Lothaire, 2001). However, the recent (and highly-approachable) paper of Rajpathak, Pillai and Bandyopadhyay (Chaos, Vol. 22, 2012) on the piecewise-linear map-with-a-gap discovers such sequences without recognising them as mechanical sequences. Proposition 16 of this section is a substantial generalisation of that result and we could not find this proposition explicitly stated in the literature. Our result is not surprising if one has the intuition that there is a topological conjugacy between the maps of this section and the piecewise linear map-with-a-gap. However, it might be difficult to explicitly identify the appropriate topological conjugacy and thereby prove our result for all cases considered here.
7.1 Definitions
Let denote a word consisting of a string of 0s and 1s in which the letter is and letters are . Let be the length of and for a word be the number of times that word appears in . Let denote the empty word and denote the infinite word constructed by repeatedly concatenating .
Consider two functions and where is an interval of We define the transformation for any word by the composition
Let be the fixed point of , so , assuming a unique fixed point on exists.
Given , we call the sequence the -threshold orbit for if
We call the -threshold word for if it is the shortest word such that for all . We shall just write -threshold orbit and -threshold word where are obvious from the context.
For , let be the morphisms (substitutions)
We say is a valid word if or for some valid word .
Remark. The morphisms generate the Christoffel tree so valid words are mechanical words. To see this, note that the Christoffel tree is generated by the following morphisms (Berstel et al, 2008, p. 37)
We may translate (from English to French) as and so any composition of and can be written as a composition of and . Likewise, any composition of and can be written as a composition of and . Specifically if then
whereas if we have
A symmetric argument holds if or .
7.2 Fixed Points
Throughout, we make the following assumption about . The existence of fixed points is addressed immediately thereafter.
Assumption A2. Functions , where is an interval of , are increasing and non-expansive. Equivalently, for all and for we have
Furthermore, the fixed points of satisfy .
Proposition 8.
Suppose A2 holds, that and that is any non-empty word. Then is increasing and non-expansive. Further, the fixed point exists and is unique.
Proof.
First we show that is increasing, by induction. In the base case, and the claim follows from A2. For the inductive step assume is increasing, where for some and word . Then for any ,
as and is increasing | ||||
Therefore is increasing.
Now we show that is non-expansive, by induction. If then this follows from A2. Else, say is non-expansive where and . Then for any ,
as and is non-expansive | ||||
as is non-expansive. |
Therefore is non-expansive.
Let . As is non-expansive we have
which rearranges to give , so that . Also is increasing as are increasing, so .
We now prove that exists. The argument of the previous paragraph shows that satisfies . A symmetric argument leads to the conclusion that . Clearly is a continuous function, so by the intermediate value theorem, there is some for which . Equivalently . Therefore a fixed point exists.
To show that the fixed point is unique, suppose both and are fixed points with . As is non-expansive we have . Yet, as we have
This is a contradiction. Therefore the fixed point is unique. ∎
Given a word , the next proposition shows when the transformation increases or decreases its argument and what might be deduced from such an increase or decrease.
Proposition 9.
Suppose A2 holds, and is any non-empty word. Then
and |
Proof.
We use Proposition 8 throughout the argument without further mention.
Say . As is increasing,
where the equality is the definition of . Also, as is non-expansive,
which rearranges to give .
Now say . As above, we then have and
so that .
The contrapositive of is . But if then as is increasing and therefore injective. Thus .
The contrapositive of is . But if then as is a fixed point. So we can conclude that .
By symmetry, and . This completes the proof. ∎
Proposition 10.
Suppose A2 holds and is any word satisfying . Then .
Proof.
Say . As we can write for some . Thus
as is increasing | ||||
as | ||||
by Proposition 9 | ||||
by repeating the same argument if . |
But this contradicts . Therefore .
A symmetrical argument leads to the conclusion that . ∎
Proposition 11.
If A2 holds and then and
Proof.
The proof that is symmetrical. ∎
Proposition 12.
Suppose A2 holds, and is any word. Let be the fixed point of for any word and let . Then
Proof.
Say . Note that
as is the fixed point of | |||||
as | |||||
as for any words | |||||
as | |||||
and | |||||
Proposition 8 shows that exist. So the above equalities show that an inverse exists for . As is increasing and continuous, we have
The proof for is symmetric. ∎
7.3 -Threshold Words
Proposition 13.
Suppose A2 holds, is the -threshold word and . Then
-
1.
-
2.
-
3.
-
4.
Proof.
If then it follows from Proposition 9 that the -threshold word is . Likewise if then the -threshold word is . In these cases Claims 1 and 2 hold, so in the following we assume that .
Claim 1: Let the -threshold orbit. If for some , then
by definition of | ||||
as for all and is increasing | ||||
if by Proposition 9. |
But if then by definition . Therefore
Claim 2: Let be the -threshold orbit. If for some , then
as for all and is increasing | ||||
if by Proposition 9. |
But if then . Therefore
The proof of Claims 3 and 4 is symmetrical. ∎
Proposition 14.
Suppose A2 holds and is a -threshold word. Then
-
1.
for some word and some
-
2.
for some word and some
Proof.
First, applying Claims 1 and 3 of Proposition 13 with we have for and for . Furthermore by Proposition 9. Thus cannot contain both 00 and 11.
So, if then is of the form with strings of 0s separated by individual 1s. Let . By Propositions 11 and 13, is the only set of values for which can contain . Thus can only contain both and in the interval
noting Proposition 11 gives
Finally, we have for , which also follows from Proposition 11. Thus if then is a concatenation of and . Equivalently for some word and some as in Claim 1.
The proof of Claim 2 is symmetric. ∎
Proposition 15.
Suppose A2 holds and is a -threshold word. Then is a valid word.
Proof.
There are three cases to consider: either or or .
First case: The only non-empty words not containing or are for some . Now -threshold words start with 0 unless (in which case ) so . Further, the -threshold word was defined to be the shortest word such that such that so this leaves us with the options . These are all valid words.
Second case: If contains 00, we may write for some word , by Proposition 14. Now from point on the -threshold orbit we have if and only if which corresponds to . So the word corresponds to a -threshold orbit for . To spell it out, we have
for |
and as for the original system, we define as the fixed point .
Now are non-negative, as are non-negative. Also are monotonically increasing and non-expansive by Proposition 8. Further,
so that by Proposition 9. But by definition and , so that . Therefore satisfy A2.
Third case: We prove that for some positive integer and word . We also show that word is a -threshold word for a pair of functions (say) which satisfy A2. The argument is symmetric to the second case, so it is omitted.
In conclusion, either
-
1.
which are valid words
- 2.
- 3.
Thus is a valid word. This completes the proof. ∎
The following proposition shows that all valid words are -threshold words and tells us explicitly which values of produce a given valid word. It is one of the key results of the main paper.
Proposition 16.
Suppose A2 is satisfied and is any valid word. Then
Proof.
Let . Note that contains and which for equals and for equals . Thus is the set of all valid words of form .
We use induction with hypothesis
Base case (). Say is the -threshold word. Then
for all | ||||
The definition of the -threshold word also gives . Therefore by Proposition 9. Thus if is the -threshold word then .
Now say . Proposition 10 gives so that the -threshold orbit is contained in . So Proposition 9 shows that and for all . So to prove that the -threshold word is we need only show that and for all . But if then for all
by Proposition 9 | ||||
as by Claim 3 of Proposition 11 | ||||
Also if then for all by Proposition 9. Therefore for , we have implies that is the -threshold word.
For , the proof that is symmetric, so it is omitted.
Inductive Step. Assume satisfies .
Say . Let so is aligned with the start of the letter of . Let and let denote the fixed point of for any word . Then we have
is the -threshold word for | |||
if and only if | |||
if and only if | |||
is the -threshold word for | |||
as satisfies | |||
by Proposition 12 |
Symmetrically we may conclude that . Therefore is true.
This completes the proof. ∎
8 Continuity of the Index
We showed that the Whittle index is increasing on the domain of each fixed Christoffel word. However, we also need to show that the index is continuous as we move between words. So here we prove the following proposition.
Proposition 17.
Suppose is as in the main paper. Then is a continuous function of .
We use the following definitions.
Definition. Let be the reverse of word , be the word constructed by concatenating infinitely many times, be the length of word and be the number of times that word is a factor of .
Definition. For a possibly-infinite word and numbers define
Remark. If is the -threshold word then where is the Whittle index.
Remark. For a word , this definition gives
(13) | ||||
so for and we have | ||||
(14) | ||||
Further, if then the formula for the sum of a geometric progression gives | ||||
(15) |
Definition. Let be the range of for which the -threshold word is .
The following construction is closely related to the beautiful Christoffel tree (Berstel et al, 2008).
Definition. Consider the mapping which takes a sequence of words and returns a sequence containing the original words mingled with the concatenation of neighbouring words as follows:
Now consider the sequences for . The first few such sequences are
. |
Remark. If then for any . Now suppose are adjacent in and we have . Then contains from which we can construct and . But and . Thus, by induction, we have shown that
for any adjacent pair in and any . | (16) |
8.1 Long Common Prefixes
We gather the results needed to prove Proposition 17. Most of these results these relate to the notion that if is small and are the - and -threshold words, then words usually have a long common prefix, although this is not always the case.
The following simple result is repeatedly used in the other Lemmas of this subsection.
Lemma 1.
Suppose is a standard pair. Then .
Proof.
As is a standard pair, is a Christoffel word. As are Christoffel words, are palindromes. Thus ∎
If is a standard pair, then the interval is immediately to the left of . Since the words and can differ within the first few letters, continuity of at is not obvious. Similarly, is immediately to the left of . However, the factors and appearing in the definitions of the corresponding Whittle indices are different for . Thus continuity of at is not obvious. The next two Lemmas address these questions.
Lemma 2.
Suppose is a standard pair and let . Then
Proof.
Now we note that repeated application of Lemma 1 gives
(18) |
Lemma 3.
Suppose is a standard pair and let . Then
Proof.
To demonstrate continuity at other points, we will need to rely on the fact that nearby words often have a long common prefix as shown by the following two Lemmas.
Lemma 4.
Suppose is a subsequence of for some . Then is a prefix of both and .
Proof.
Let indicate that is a prefix of word and consider the statements
It suffices to show that and are true for any adjacent words in for . This is because
where the last equality follows from Lemma 1 and | ||||
which are the claims of the Lemma.
We shall use induction. Take as the base case. We must show that are true. However these statements are respectively that and are all true.
Otherwise, say are true for any adjacency in . Let so
using Lemma 1 again. Then the statements are all true as
by and as | ||||
as | ||||
as | ||||
by and as . |
Thus are true for all adjacencies in . This completes the proof. ∎
Lemma 5.
Suppose are adjacent in and that lies strictly between them in for some . Then .
Proof.
The interval of between is constructed from in exactly the same way as was constructed from . Thus for some positive integer . Now recall that by Lemma 1. Thus as claimed. ∎
Although the existence of a long common prefix for nearby words suggests continuity, to prove anything we must bound the residual after removing the long common prefix. The following Lemma is one way to achieve this.
Lemma 6.
Suppose , let be the -threshold word and let where . Then
Proof.
The highest point on the orbits and is since is the -threshold word. The terms of the discounted sums
and |
are from the orbits and and for any word as . Therefore terms , are also no higher than . Furthermore, terms are non-negative, so that . Thus ∎
Although it is clear that is continuous, a bound on its slope is helpful.
Lemma 7.
Suppose and that is a valid word. Then .
Proof.
The definition of gives
where the second inequality follows as and for any word since . ∎
We use one more result about of the main paper.
Lemma 8.
Suppose and are as in the main paper and . Then .
Proof.
The definitions of give
which is positive as and . ∎
Our proof of continuity will rely on the standard definition in which we will put where is defined in the following Lemma.
Lemma 9.
For any there is a such that
Proof.
Say are adjacent in . Then by construction of , the gap contains intervals corresponding to words of . Each of these intervals is at most in length. Thus . This demonstrates the existence of a such that .
To show that for finite , we shall demonstrate that assuming leads to a contradiction. If then there is some word such that . Therefore . Now in , functions have inverses, so is well-defined. Therefore
which contradicts Lemma 8 as . ∎
8.2 Proof of Continuity
Proof.
We wish to show that for any , there exists a such that for any we have . Without loss of generality we assume that .
Specifically, we shall put where is as defined in Lemma 9 and is any positive integer such that and such that . The existence of such a is guaranteed by Lemma 9 and because .
Let be the - and -threshold words. If these words are the same then
where the second inequality follows from Lemma 7, the third from and the fourth from the definition of .
Otherwise . In this case, let be the standard pair for word , let and . Noting that , our strategy is to write
It remains to consider . It follows from the definition of , that for some adjacent words in : either or is a word strictly between and in the sense of Lemma 5; and that is a word strictly between and . Thus by Lemma 5 we have and where and are the appropriate suffixes. Therefore the definition of gives
(22) |
where the last four inequalities follow from the triangle inequality, from Lemma 6, from equation 16 coupled with the fact that and finally from the definition of .
9 Properties of the Linear-System Orbits
Recall the definitions about words from the main paper, particularly that is the reverse of . Also, recall the definitions of matrices . The first of the following propositions is used to prove the second. The second appears in the main paper.
Proposition 18.
Suppose are any words. Then
-
1.
-
2.
,
-
3.
for some ,
-
4.
for some ,
-
5.
,
-
6.
.
Proof.
as gives Claim 1.
Claim 2. The definitions of give . Thus
.
The result follows as .
Claim 3. Put
and solve for .
Claim 4. Substituting Claim 2 and Claim 3 in Claim 4 gives
Claim 5. Put .
We calculate
as , and .
The result follows as and .
Claim 6.
If then .
Otherwise we use induction
on to show that
where .
In the base case so
For the inductive step, assume for some word satisfying . Then
As , this completes the proof. ∎
Proposition 19.
Suppose is a word, is a palindrome and . Then
-
1.
for some ,
-
2.
,
-
3.
If then for some ,
-
4.
If is a prefix of then ,
-
5.
,
-
6.
.
Proof.
In this proof, we refer to Claim of Proposition 18 as P.
Claim 1. P2 gives as .
But in the notation of P3,
says .
Solve this for and substitute in P3.
Claim 2. Noting that ,
the notation of Claim 1 gives
Claim 3.
Note we can move from to just by swapping some for . So, repeated application of P5 gives the inequality
But the denominators of this inequality are equal (and non-negative)
as P4 gives for some .
Thus this inequality reduces to
.
Yet P4 also gives
which combined with the previous sentence says that . As , this gives .
Claim 4. Let be the corresponding suffix so
and
But Claim 3 with gives
Claim 5. As , Claim 3 with gives
Claim 6. Let Then , so that
by Claim 5. This completes the proof. ∎
10 Majorisation
In the main paper, we used one result about majorisation which was similar-but-not-identical to any results in Marshall, Olson and Arnold (2011). Let us prove that result.
Proposition 20.
Suppose and is a symmetric function that is convex and decreasing on . Then .
Proof.
As the claim relates to and we assume that and are in ascending order.
Marshall et al (3H2B, page 133) says that if is a non-decreasing and convex function on and is a non-increasing and non-negative sequence, then for all non-increasing sequences the function is Schur-convex.
Indeed the function is increasing and convex for (such as and ) and is a non-increasing and non-negative sequence for . Thus for all non-increasing sequences on the function is Schur-convex.
Recall (ibid, page 12) that is said to be weakly submajorised by , written if
and that (ibid, page 13).
However (ibid, 3A8, page 87) if is a real function on which is non-decreasing in each argument and Schur-convex on and on then .
Indeed, the function is a real function on which is non-decreasing in each argument and Schur-convex on for all non-increasing sequences . Furthermore, as . Therefore as claimed. ∎
11 Clarification of Theorem 1 for or
Recall the following definitions and assumption from the main paper
If or then the relevant linear systems, (9) in the main paper, are
where both inequalities follow as are all , as and as Therefore all cumulative sums of the above expressions are non-negative so the derivative of the numerator of the Whittle index is non-negative by the same weak-supermajorisation argument as in the main paper.
Meanwhile, the denominator of the index in these cases is
which is non-negative. Therefore the rest of the proof of Theorem 1 follows as in the main paper.
In fact we could say that the majorisation point, which is for words in the main paper, is in both cases. Indeed, Claim 6 of Proposition 4 of the main paper says that . Also, Thus for all , whereas for any