Multiple sequences Prophet Inequality Under Observation Constraints.
Abstract
In our problem, we are given access to a number of sequences of nonnegative i.i.d. random variables, whose realizations are observed sequentially. All sequences are of the same finite length. The goal is to pick one element from each sequence in order to maximize a reward equal to the expected value of the sum of the selections from all sequences. The decision on which element to pick is irrevocable, i.e., rejected observations cannot be revisited. Furthermore, the procedure terminates upon having a single selection from each sequence. Our observation constraint is that we cannot observe the current realization of all sequences at each time instant. Instead, we can observe only a smaller, yet arbitrary, subset of them. Thus, together with a stopping rule that determines whether we choose or reject the sample, the solution requires a sampling rule that determines which sequence to observe at each instant. The problem can be solved via dynamic programming, but with an exponential complexity in the length of the sequences. In order to make the solution computationally tractable, we introduce a decoupling approach and determine each stopping time using either a single-sequence dynamic programming, or a Prophet Inequality inspired threshold method, with polynomial complexity in the length of the sequences. We prove that the decoupling approach guarantees at least of the optimal expected reward of the joint problem. In addition, we describe how to efficiently compute the optimal number of samples for each sequence, and its’ dependence on the variances.
I Introduction
In many applications, multiple data sequences are monitored sequentially with an aim to decide the best instant to terminate the observation procedure, and use the collected information to maximize an objective. Financial applications include the design of posted pricing mechanisms for auctions [1, 2, 3] and contention resolution schemes [4]. Recent engineering applications focus on the optimization of computer hardware performance, such as computational sprinting [5, 6], which provides a significant performance boost to microchips. These applications, among many others, have motivated a large body of work in optimal stopping theory, with the dynamic programming [7, Ch. 24] being the primary resolution method for a broad class of them.
In systems with multiple data sequences, it is not always possible to observe the current sample of each sequence due to resource limitations or other observation constraints [8, 9]. For example, when following a large number of auctions in parallel, it is difficult to analyze all offers at all times. In computational sprinting hardware mechanisms, a software predicts the performance boost of a microchip when we allow short-term overheating, but limited computational resources render it impossible to run the software for all microchips, at each instant. In both cases, only a smaller number of sequences can be processed at any given instant, but without restrictions on which they are. This type of observation constraints leads to a problem at the intersection of optimal stopping and multi-armed bandit theory [10].
In our problem, we assume that the data sequences comprise i.i.d. random variables that are independent of each other, and whose distributions are allowed to differ. Our constraint is that we can only observe a fixed-size subset of the sequences, at each instant. Our goal is to determine which sequences to observe at each instant, and when to stop sampling at each sequence, in order to maximize our reward, which is the sum of the expected values of the observations at the selected stopping times. This is a combined optimal stopping and sampling problem, which could be treated by dynamic programming with an exponential complexity on the length of the sequences. In order to reduce the complexity, we introduce a decoupling approach, based on the Prophet Inequality [3, 11, 12, 13, 14], that reduces the complexity to polynomial, and guarantees at least of the optimal expected reward of the joint problem.
The Prophet Inequality compares the expected value of the picked element at the optimal stopping time, for each sequence, with the expected value of its’ maximum element, by providing a tight lower bound on the ratio of the respective expected values. The lower bound is proven to be equal to , for an i.i.d. sequence, in [3, 15], independent of the distribution and of the length of the sequence. This benchmark has motivated the design of stopping rules for our problem, which always satisfy the Prophet Inequality, although they might be sub-optimal in some cases. In [3, Corollary 4.7], an algorithm is provided for the computation of the thresholds of such a stopping rule, which is simpler compared to that of the single-sequence dynamic programming (Single-DP), although both methods have linear complexity on the length of the sequences. We refer to the algorithm in [3], as Prophet Inequality thresholding method (PI-thresholding).
The basic requirement of the decoupling approach, is the calculation of the optimal number of samples for each sequence, based on which we can apply Single-DP, or PI-thresholding on each sequence separately. This results in a constrained optimization problem that depends on the distributions of all sequences. The computation might be intensive for large sequences, thus, under some smoothness assumptions on the optimization objective, we develop an approximation technique for the number of samples, whose error converges to zero as the length of the sequences goes to infinity.
The paper is organized as follows. In Section II, we present the problem formulation, while in Section III we prove our first result, pertaining to the approximation ratio of the decoupling approach. Then, in Section IV, we describe the approximation technique for the computation of the optimal number of samples for each sequence. In our last Section V, we present computational examples, which support the efficiency of the decoupling approach, for both the Single-DP and the PI-thresholding method.
II Problem formulation
Let be an arbitrary measurable space and be a probability space which hosts independent sequences of i.i.d. -valued random variables,
where , . We aim to find the stopping time , for each sequence , that attains
(1) |
where the class of all stopping times that take values in . Formally, a stopping time is a random variable , , for which the event , , is fully determined by the observations up to time . Since each sequence is associated to a stopping time , we make subsequent use of the vector of stopping times of all sequences, i.e., .
The observations from each sequence are made sequentially and our decision to stop at time and pick as our “best” choice for the optimization problem (1) is irrevocable, i.e., we cannot revisit samples we rejected, nor we can examine samples that follow after we stop. The Prophet inequality [3, 15] provides a tight lower bound of on the ratio of (1) over , for each . By the term “tight”, we mean that for any , there exists a distribution that satisfies the inequality by equality, [15].
The main aspect that distinguishes our problem from the relevant bibliography, is the introduction of observation constraints to our model, i.e., it is not possible to observe the current element of each sequence at each instant , but only from a subset of them of size , independent of which they actually are. Hence, we denote by the subset of sequences that are observed at time , i.e.,
where
We say that is a sampling rule if, for every , the is -measurable, where is the -algebra generated by the observed elements up to time according to rule , i.e.,
The policy belongs to the class if the number of sequences observed at each sampling instant is equal to , i.e.,
(2) |
Our goal is to find a policy that optimizes the objective
(3) |
under the assumption that for each , the i.i.d. sequence has a finite first moment.
III The Decoupling Approach
The optimization problem (3) can be solved via dynamic programming, with a computational complexity of . In order to reduce the complexity, we describe a decoupling approach that produces a -constant approximation for (3), with complexity polynomial in .
For any sampling rule , we denote by
the total number of elements we have observed from sequence up to time , and by the optimal stopping time of sequence , associated with the sampling rule . The decoupling approach consists of the following steps:
-
(i)
Since for each the sequence is i.i.d., it suffices to determine the number of observations for sequence , without pinpointing the exact times at which we observe. The values shall satisfy a particular optimization criterion, independent of .
-
(ii)
We design a sampling rule which guarantees observations for each , and respects the sampling constraint (2), which implies
(4) -
(iii)
Given the observations for each , in order to determine the decoupled optimal stopping times , , we can use either the single-sequence dynamic programming [7, Chapter 24], or the Prophet Inequality thresholding method in [3, Corollary 4.7]. The latter, although computationally simpler, it may provide a sub-optimal reward.
In order to formulate the optimization criterion that the must satisfy, we note that since the elements of each sequence are i.i.d., all subsequences of elements have the same statistical behavior, and thus, by Prophet inequality [3, 15], for , and for each ,
(5) |
where . Thus, the are chosen as the maximizers of
(6) |
where
(7) |
because by (5), the criterion (6) guarantees that
(8) | ||||
Theorem III.1
Proof:
In view of (8), it suffices to show that
(9) | ||||
Let us denote by the optimal sampling rule for (3), as determined by the dynamic programming algorithm, which we denote by throughout the proof. Thus,
(10) |
The event
(11) |
is fully determined by the elements we observed up to the respective times , i.e.,
(12) |
The algorithm , generates also the conditions, which based on the samples of each sequence up to times respectively, determine the event (11). Thus, if we denote by the conditions of for the times , with a slight abuse of notation we have
Since all sequences are i.i.d. and independent of each other, the random variables in (12) are interchangeable with the first random variables in each sequence . As a result,
Thus, for a sampling rule which examines the elements one by one, without missing the elements that it can not observe, as they remain stand by, one has
(13) |
and
(14) |
Hence, for each ,
(15) | ||||
Therefore, it suffices to show that
(16) | ||||
Indeed, by sampling constraint (2), and since , for all , we have
Since each term in (6) is increasing in , we conclude (16). ∎
IV Maximization Problem
We focus on the computation of the maximizers of (6). For large , finding the exact values is computationally demanding, and thus we suggest an approximation technique along with error guarantees.
For each , we restrict our attention to absolutely continuous random variables, whose probability density and cumulative distribution functions are denoted by and , respectively. The density of the maximum of observations from sequence is denoted by , i.e.,
The maximization problem (6) turns into Problem :
(17) |
subject to
(18) |
and , for all .
A solution approach to is the exhaustive search, of computational complexity . Thus, in the following subsection, we introduce a computationally simpler approximation method, whose computational complexity is , and hence independent of , with an approximation error converging to as .
IV-A The approximation problem
For each , we set and rewrite problem as Problem :
(19) |
subject to
(20) |
and , for all . We denote by the objective function (19), where lies in the set
For simplicity, we assume that the function is differentiable everywhere on . Thus, the function is continuous everywhere on the compact set , and by the Extreme value theorem [16, Theorem 4.16] it achieves a maximum on . By the Lagrange multipliers method and provided that the induced system of equations has a solution, we obtain the solution of , denoted by . Then, within , we replace by
which reduces the number of possible solutions from to , for all . Hence, we obtain Problem :
(21) |
subject to
(22) | ||||
Solving using exhaustive search, we compute an approximation of the optimal solution of .
IV-B The cost of approximation
We provide a bound on the approximation error for the sample sizes , which converges to as increases. We denote by the objective function of problems , , and by
-
(i)
the optimal solution of problem , with , for all .
-
(ii)
the optimal solution of problem , with , for all .
We define the cost of approximation as
(23) |
Next, we prove that under smoothness conditions on , the cost is bounded by a constant multiplied by ,
(24) |
Theorem IV.1
If is everywhere differentiable on , with bounded derivative, then , where
(25) |
Proof:
By definition of the objective functions , , we observe that for any it holds
(26) |
which implies that
(27) | ||||
where the inequality follows by the optimality of for problem . Also, by the optimality of for problem , we have
(28) |
By definition (23), and inequalities (27)-(28) we have
(29) |
Since is everywhere differentiable with bounded derivative on , and is convex, by Rademacher’s theorem [17, Theorem 1.41] we deduce that
(30) | ||||
where is as defined in (25). This proves the claim. ∎
IV-C Uniform densities example
We consider sequences of i.i.d. random variables each, which follow a uniform distribution, where , for each . We can observe only sequences at each time instant. In this case, problem takes the form
subject to its underlying constraints. By the Lagrange multipliers method, for each , we have
(32) |
Remark: For all , the differences determine the values of , which implies that the variances govern the sampling rates.
IV-D Gaussian densities example
In various frameworks, e.g. [5, Section 5.1], the sequences follow a Gaussian density , for each , and we can observe sequences at each instant. We assume that the means of the Gaussians are large enough, and the variances relatively small, so that the random variables are positive with probability practically equal to one. According to Blom’s formula [18], and the approximation of the function presented in [19], for large enough, and for each ,
Thus, problem reduces to
(33) |
subject to its defining constraints. By Lagrange multipliers method, we obtain
(34) |
We observe that except for the change in a constant multiplier of , the in (34) exhibit the same dependence on the variances as those in (32), especially for large .
V Computational examples
We compare the expected reward of the joint dynamic programming problem (3) versus that of the decoupled problem. For the decoupled problem, we precompute the optimal number of observations for each sequence, and then on each individual sequence, we run either (i) a single-sequence dynamic programming (Single-DP), or (ii) a Prophet Inequality thresholding (PI-thresholding) method [3, Corollary 4.7].
We consider sequences that follow three different uniform distributions , , , which have the same mean but different variances, and of lengths . We consider two cases and , and we plot the ratio of the expected reward of the decoupled problem, for both Single-DP and PI-thresholding, over that of the joint problem.
For the decoupling approach offers a very good approximation for the problem (3), with a ratio above for the Single-DP case, and above for the PI-thresholding, for all . We also note that the ratio of the PI-thresholding method is no more than smaller than that of the Single-DP method which is always optimal for a single sequence.
For the ratio is smaller compared to the former case, but it is still a good approximation, with a ratio above for the Single-DP, and above for the PI-thresholding method, for all . In this case the gap between Single-DP and PI-thresholding is larger, but no more than .


Acknowledgments
The work was supported in part by NSF awards 2008125 and 1956384, through Coordinated Science Laboratory, University of Illinois at Urbana-Champaign.
References
- [1] S. Alaei, “Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers,” SIAM Journal on Computing, vol. 43, no. 2, pp. 930–972, 2014.
- [2] M. Feldman, N. Gravin, and B. Lucier, “Combinatorial auctions via posted prices,” in Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 2014, pp. 123–135.
- [3] J. Correa, P. Foncea, R. Hoeksma, T. Oosterwijk, and T. Vredeveld, “Posted price mechanisms for a random stream of customers,” in Proceedings of the 2017 ACM Conference on Economics and Computation, 2017, pp. 169–186.
- [4] E. Lee and S. Singla, “Optimal online contention resolution schemes via ex-ante prophet inequalities,” in 26th Annual European Symposium on Algorithms, ESA 2018, August 20-22, 2018, Helsinki, Finland, vol. 112, 2018, pp. 57:1–57:14.
- [5] Z. Huang, J. A. Joao, A. Rico, A. D. Hilton, and B. C. Lee, “Dynasprint: Microarchitectural sprints with dynamic utility and thermal management,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 426–439.
- [6] M. Epitropou and R. Vohra, “Optimal on-line allocation rules with verification,” in Algorithmic Game Theory: 12th International Symposium, SAGT 2019, Athens, Greece, September 30–October 3, 2019, Proceedings 12. Springer, 2019, pp. 3–17.
- [7] R. N. Bhattacharya and E. C. Waymire, Random walk, Brownian motion, and martingales. Springer, 2021.
- [8] S. Nitinawarat, G. K. Atia, and V. V. Veeravalli, “Controlled sensing for multihypothesis testing,” IEEE Transactions on automatic control, vol. 58, no. 10, pp. 2451–2464, 2013.
- [9] Q. Xu, Y. Mei, and G. V. Moustakides, “Optimum multi-stream sequential change-point detection with sampling control,” IEEE Transactions on Information Theory, vol. 67, no. 11, pp. 7627–7636, 2021.
- [10] Q. Zhao, Multi-armed bandits: Theory and applications to online learning in networks. Springer Nature, 2022.
- [11] U. Krengel and L. Sucheston, “On semiamarts, amarts, and processes with finite value,” Probability on Banach spaces, vol. 4, pp. 197–266, 1978.
- [12] D. Assaf and E. Samuel-Cahn, “Simple ratio prophet inequalities for a mortal with multiple choices,” Journal of applied probability, vol. 37, no. 4, pp. 1084–1091, 2000.
- [13] J. Correa, R. Saona, and B. Ziliotto, “Prophet secretary through blind strategies,” Mathematical Programming, vol. 190, no. 1-2, pp. 483–521, 2021.
- [14] A. Bubna and A. Chiplunkar, “Prophet inequality: Order selection beats random order,” in Proceedings of the 24th ACM Conference on Economics and Computation, 2023, pp. 302–336.
- [15] R. P. Kertz, “Stop rule and supremum expectations of iid random variables: a complete comparison by conjugate duality,” Journal of multivariate analysis, vol. 19, no. 1, pp. 88–112, 1986.
- [16] W. Rudin, Principles of mathematical analysis, 1953.
- [17] N. Weaver, Lipschitz algebras. World Scientific, 2018.
- [18] J. Royston, “Algorithm as 177: Expected normal order statistics (exact and approximate),” Journal of the royal statistical society. Series C (Applied statistics), vol. 31, no. 2, pp. 161–165, 1982.
- [19] D. Dominici, “Some properties of the inverse error function,” Contemporary Mathematics, vol. 457, pp. 191–204, 2008.