This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multiple sequences Prophet Inequality Under Observation Constraints.

Aristomenis Tsopelakos, Olgica Milenkovic Coordinated Science Laboratory, University of Illinois at Urbana-Champaign.
Abstract

In our problem, we are given access to a number of sequences of nonnegative i.i.d. random variables, whose realizations are observed sequentially. All sequences are of the same finite length. The goal is to pick one element from each sequence in order to maximize a reward equal to the expected value of the sum of the selections from all sequences. The decision on which element to pick is irrevocable, i.e., rejected observations cannot be revisited. Furthermore, the procedure terminates upon having a single selection from each sequence. Our observation constraint is that we cannot observe the current realization of all sequences at each time instant. Instead, we can observe only a smaller, yet arbitrary, subset of them. Thus, together with a stopping rule that determines whether we choose or reject the sample, the solution requires a sampling rule that determines which sequence to observe at each instant. The problem can be solved via dynamic programming, but with an exponential complexity in the length of the sequences. In order to make the solution computationally tractable, we introduce a decoupling approach and determine each stopping time using either a single-sequence dynamic programming, or a Prophet Inequality inspired threshold method, with polynomial complexity in the length of the sequences. We prove that the decoupling approach guarantees at least 0.7450.745 of the optimal expected reward of the joint problem. In addition, we describe how to efficiently compute the optimal number of samples for each sequence, and its’ dependence on the variances.

I Introduction

In many applications, multiple data sequences are monitored sequentially with an aim to decide the best instant to terminate the observation procedure, and use the collected information to maximize an objective. Financial applications include the design of posted pricing mechanisms for auctions [1, 2, 3] and contention resolution schemes [4]. Recent engineering applications focus on the optimization of computer hardware performance, such as computational sprinting [5, 6], which provides a significant performance boost to microchips. These applications, among many others, have motivated a large body of work in optimal stopping theory, with the dynamic programming [7, Ch. 24] being the primary resolution method for a broad class of them.

In systems with multiple data sequences, it is not always possible to observe the current sample of each sequence due to resource limitations or other observation constraints [8, 9]. For example, when following a large number of auctions in parallel, it is difficult to analyze all offers at all times. In computational sprinting hardware mechanisms, a software predicts the performance boost of a microchip when we allow short-term overheating, but limited computational resources render it impossible to run the software for all microchips, at each instant. In both cases, only a smaller number of sequences can be processed at any given instant, but without restrictions on which they are. This type of observation constraints leads to a problem at the intersection of optimal stopping and multi-armed bandit theory [10].

In our problem, we assume that the data sequences comprise i.i.d. random variables that are independent of each other, and whose distributions are allowed to differ. Our constraint is that we can only observe a fixed-size subset of the sequences, at each instant. Our goal is to determine which sequences to observe at each instant, and when to stop sampling at each sequence, in order to maximize our reward, which is the sum of the expected values of the observations at the selected stopping times. This is a combined optimal stopping and sampling problem, which could be treated by dynamic programming with an exponential complexity on the length of the sequences. In order to reduce the complexity, we introduce a decoupling approach, based on the Prophet Inequality [3, 11, 12, 13, 14], that reduces the complexity to polynomial, and guarantees at least 0.7450.745 of the optimal expected reward of the joint problem.

The Prophet Inequality compares the expected value of the picked element at the optimal stopping time, for each sequence, with the expected value of its’ maximum element, by providing a tight lower bound on the ratio of the respective expected values. The lower bound is proven to be equal to 0.7450.745, for an i.i.d. sequence, in [3, 15], independent of the distribution and of the length of the sequence. This benchmark has motivated the design of stopping rules for our problem, which always satisfy the Prophet Inequality, although they might be sub-optimal in some cases. In [3, Corollary 4.7], an algorithm is provided for the computation of the thresholds of such a stopping rule, which is simpler compared to that of the single-sequence dynamic programming (Single-DP), although both methods have linear complexity on the length of the sequences. We refer to the algorithm in [3], as Prophet Inequality thresholding method (PI-thresholding).

The basic requirement of the decoupling approach, is the calculation of the optimal number of samples for each sequence, based on which we can apply Single-DP, or PI-thresholding on each sequence separately. This results in a constrained optimization problem that depends on the distributions of all sequences. The computation might be intensive for large sequences, thus, under some smoothness assumptions on the optimization objective, we develop an approximation technique for the number of samples, whose error converges to zero as the length of the sequences goes to infinity.

The paper is organized as follows. In Section II, we present the problem formulation, while in Section III we prove our first result, pertaining to the approximation ratio of the decoupling approach. Then, in Section IV, we describe the approximation technique for the computation of the optimal number of samples for each sequence. In our last Section V, we present computational examples, which support the efficiency of the decoupling approach, for both the Single-DP and the PI-thresholding method.

II Problem formulation

Let (+,𝒮)(\mathbb{R}_{+},\mathcal{S}) be an arbitrary measurable space and (Ω,,𝖯)(\Omega,\mathcal{F},\mathsf{P}) be a probability space which hosts MM independent sequences of nn i.i.d. +\mathbb{R}_{+}-valued random variables,

Xi:={Xi(m):m[n]},i[M],X_{i}:=\{X_{i}(m)\,:\,m\in[n]\},\quad\quad i\in[M],

where [n]:={1,,n}[n]:=\{1,\ldots,n\}, [M]:={1,,M}[M]:=\{1,\ldots,M\}. We aim to find the stopping time τi[n]\tau_{i}\in[n], for each sequence i[M]i\in[M], that attains

supτi𝒯𝖤[Xi(τi)],\sup_{\tau_{i}\in\mathcal{T}}\mathsf{E}[X_{i}(\tau_{i})], (1)

where 𝒯\mathcal{T} the class of all stopping times that take values in [n][n]. Formally, a stopping time is a random variable τi\tau_{i}, i[M]i\in[M], for which the event {τi=m}\{\tau_{i}=m\}, m[n]m\in[n], is fully determined by the observations up to time mm. Since each sequence i[M]i\in[M] is associated to a stopping time τi\tau_{i}, we make subsequent use of the vector of stopping times of all sequences, i.e., T:=(τ1,,τM)T:=(\tau_{1},\ldots,\tau_{M}).

The observations from each sequence i[M]i\in[M] are made sequentially and our decision to stop at time τi\tau_{i} and pick Xi(τi)X_{i}(\tau_{i}) as our “best” choice for the optimization problem (1) is irrevocable, i.e., we cannot revisit samples we rejected, nor we can examine samples that follow after we stop. The Prophet inequality [3, 15] provides a tight lower bound of 0.7450.745 on the ratio of (1) over 𝖤[maxm[n]Xi(m)]\mathsf{E}[\max_{m\in[n]}X_{i}(m)], for each i[M]i\in[M]. By the term “tight”, we mean that for any nn, there exists a distribution that satisfies the inequality by equality, [15].

The main aspect that distinguishes our problem from the relevant bibliography, is the introduction of observation constraints to our model, i.e., it is not possible to observe the current element of each sequence at each instant m[n]m\in[n], but only from a subset of them of size K<MK<M, independent of which they actually are. Hence, we denote by R(m)R(m) the subset of sequences that are observed at time mm, i.e.,

R(m):={i[M]:Ri(m)=1},R(m):=\left\{i\in[M]\,:\,R_{i}(m)=1\right\},

where

Ri(m):=𝟏{observeXi(m)}.R_{i}(m):=\mathbf{1}\left\{\mbox{observe}\;X_{i}(m)\right\}.

We say that RR is a sampling rule if, for every m[n1]m\in[n-1], the R(m+1)R(m+1) is R(m)\mathcal{F}^{R}(m)-measurable, where R(m)\mathcal{F}^{R}(m) is the σ\sigma-algebra generated by the observed elements up to time mm according to rule RR, i.e.,

R(m):=\displaystyle\mathcal{F}^{R}(m):=
{, if m=0,σ(R(m1),{Xi(m):iR(m)}), if m[n].\displaystyle\begin{cases}\emptyset,\;&\mbox{ if }\;m=0,\\ \sigma\left(\mathcal{F}^{R}(m-1),\{X_{i}(m)\,:\,i\in R(m)\}\right),\;&\mbox{ if }\;m\in[n].\end{cases}

The policy (R,T)(R,T) belongs to the class 𝒞(K)\mathcal{C}(K) if the number of sequences observed at each sampling instant is equal to KK, i.e.,

i=1MRi(m)=K,m[n].\sum_{i=1}^{M}R_{i}(m)=K,\quad\forall\;m\in[n]. (2)

Our goal is to find a policy (R,T)𝒞(K)(R,T)\in\mathcal{C}(K) that optimizes the objective

sup(R,T)𝒞(K)i=1M𝖤[Xi(τi)],\sup_{(R,T)\in\mathcal{C}(K)}\sum_{i=1}^{M}\mathsf{E}\left[X_{i}(\tau_{i})\right], (3)

under the assumption that for each i[M]i\in[M], the i.i.d. sequence XiX_{i} has a finite first moment.

III The Decoupling Approach

The optimization problem (3) can be solved via dynamic programming, with a computational complexity of O((MK)n)O\left({M\choose K}^{n}\right). In order to reduce the complexity, we describe a decoupling approach that produces a 0.7450.745-constant approximation for (3), with complexity polynomial in nn.

For any sampling rule RR, we denote by

NiR(n):=m=1nRi(m),i[M],N^{R}_{i}(n):=\sum_{m=1}^{n}R_{i}(m),\quad i\in[M],

the total number of elements we have observed from sequence ii up to time nn, and by τiR\tau^{R}_{i} the optimal stopping time of sequence ii, associated with the sampling rule RR. The decoupling approach consists of the following steps:

  1. (i)

    Since for each i[M],i\in[M], the sequence XiX_{i} is i.i.d., it suffices to determine the number of observations Ni(n)N_{i}(n) for sequence ii, without pinpointing the exact times at which we observe. The values {Ni(n):i[M]}\{N_{i}(n)\,:\,i\in[M]\} shall satisfy a particular optimization criterion, independent of RR.

  2. (ii)

    We design a sampling rule RdR^{d} which guarantees Ni(n)N_{i}(n) observations for each i[M]i\in[M], and respects the sampling constraint (2), which implies

    NiRd(n)\displaystyle N_{i}^{R^{d}}(n) =Ni(n),i[M],\displaystyle=N_{i}(n),\quad\forall\;i\in[M], (4)
    i=1MNiRd(n)\displaystyle\sum_{i=1}^{M}N_{i}^{R^{d}}(n) =Kn.\displaystyle=Kn.
  3. (iii)

    Given the NiRd(n)N_{i}^{R^{d}}(n) observations for each i[M]i\in[M], in order to determine the decoupled optimal stopping times τiRd\tau_{i}^{R^{d}}, i[M]i\in[M], we can use either the single-sequence dynamic programming [7, Chapter 24], or the Prophet Inequality thresholding method in [3, Corollary 4.7]. The latter, although computationally simpler, it may provide a sub-optimal reward.

In order to formulate the optimization criterion that the {Ni(n):i[M]}\{N_{i}(n):i\in[M]\} must satisfy, we note that since the elements of each sequence are i.i.d., all subsequences of Ni(n)N_{i}(n) elements have the same statistical behavior, and thus, by Prophet inequality [3, 15], for RdR^{d}, and for each i[M]i\in[M],

𝖤[Xi(τiRd)]D𝖤[max1mNiRd(n)Xi(m)],\mathsf{E}\left[X_{i}(\tau^{R^{d}}_{i})\right]\geq D\,\mathsf{E}\left[\max_{1\leq m\leq N^{R^{d}}_{i}(n)}X_{i}(m)\right], (5)

where D:=0.745D:=0.745. Thus, the {Ni(n):i[M]}\{N_{i}(n):i\in[M]\} are chosen as the maximizers of

max(n1,,nM)(K)i=1M𝖤[max1mniXi(m)],\max_{(n_{1},\ldots,n_{M})\in\mathfrak{C}(K)}\sum_{i=1}^{M}\mathsf{E}\left[\max_{1\leq m\leq n_{i}}X_{i}(m)\right], (6)

where

(K):={(n1,,nM)[n]M:i=1M=Kn},\mathfrak{C}(K):=\left\{(n_{1},\ldots,n_{M})\in[n]^{M}\,:\,\sum_{i=1}^{M}=Kn\right\}, (7)

because by (5), the criterion (6) guarantees that

sup(R,T)𝒞(K)i=1M𝖤[Xi(τi)]\displaystyle\sup_{(R,T)\in\mathcal{C}(K)}\sum_{i=1}^{M}\mathsf{E}\left[X_{i}(\tau_{i})\right] (8)
Dmax(n1,,nM)(K)i=1M𝖤[max1mniXi(m)].\displaystyle\quad\geq D\max_{(n_{1},\ldots,n_{M})\in\mathfrak{C}(K)}\sum_{i=1}^{M}\mathsf{E}\left[\max_{1\leq m\leq n_{i}}X_{i}(m)\right].
Theorem III.1

The decoupling approach achieves at least 0.7450.745 of the optimal expected reward of the joint optimization problem (3), provided that the number of samples from each sequence is optimized as of (6).

Proof:

In view of (8), it suffices to show that

sup(R,T)𝒞(K)i=1M𝖤[Xi(τi)]\displaystyle\sup_{(R,T)\in\mathcal{C}(K)}\sum_{i=1}^{M}\mathsf{E}\left[X_{i}(\tau_{i})\right] (9)
max(n1,,nM)(K)i=1M𝖤[max1mniXi(m)].\displaystyle\quad\leq\max_{(n_{1},\ldots,n_{M})\in\mathfrak{C}(K)}\sum_{i=1}^{M}\mathsf{E}\left[\max_{1\leq m\leq n_{i}}X_{i}(m)\right].

Let us denote by RR^{*} the optimal sampling rule for (3), as determined by the dynamic programming algorithm, which we denote by 𝒜\mathcal{A} throughout the proof. Thus,

sup(R,T)𝒞(K)i=1M𝖤[Xi(τi)]=i=1M𝖤[Xi(τiR)].\sup_{(R,T)\in\mathcal{C}(K)}\sum_{i=1}^{M}\mathsf{E}\left[X_{i}(\tau_{i})\right]=\sum_{i=1}^{M}\mathsf{E}\left[X_{i}\left(\tau^{R^{*}}_{i}\right)\right]. (10)

The event

{τ1R=t1,,τMR=tM}\displaystyle\{\tau^{R^{*}}_{1}=t_{1},\ldots,\tau^{R^{*}}_{M}=t_{M}\} (11)

is fully determined by the elements we observed up to the respective times t1,,tM[n]t_{1},\ldots,t_{M}\in[n], i.e.,

i=1M{Xi(k)Ri(k): 1kti}.\bigcup_{i=1}^{M}\left\{X_{i}(k)R^{*}_{i}(k)\,:\,1\leq k\leq t_{i}\right\}. (12)

The algorithm 𝒜\mathcal{A}, generates also the conditions, which based on the samples of each sequence up to times {ti:i[M]}\{t_{i}:i\in[M]\} respectively, determine the event (11). Thus, if we denote by 𝒜t1,..,tM\mathcal{A}_{t_{1},..,t_{M}} the conditions of 𝒜\mathcal{A} for the times {ti:i[M]}\{t_{i}:i\in[M]\}, with a slight abuse of notation we have

𝖯{τ1R=t1,..,τMR=tM}\displaystyle\mathsf{P}\{{\tau^{R^{*}}_{1}=t_{1},..,\tau^{R^{*}}_{M}=t_{M}\}}
=𝖯(𝒜t1,..,tM(i=1M{Xi(k)Ri(k):1kti})).\displaystyle=\mathsf{P}\left(\mathcal{A}_{t_{1},..,t_{M}}\left(\bigcup_{i=1}^{M}\left\{X_{i}(k)R^{*}_{i}(k):1\leq k\leq t_{i}\right\}\right)\right).

Since all sequences are i.i.d. and independent of each other, the random variables in (12) are interchangeable with the first NiR(ti)N^{R^{*}}_{i}(t_{i}) random variables in each sequence i[M]i\in[M]. As a result,

𝖯(τ1R=t1,..,τMR=tM)\displaystyle\mathsf{P}\left(\tau^{R^{*}}_{1}=t_{1},..,\tau^{R^{*}}_{M}=t_{M}\right)
=𝖯(𝒜t1,..,tM(i=1M{Xi(k):1kNiR(ti)})).\displaystyle=\mathsf{P}\left(\mathcal{A}_{t_{1},..,t_{M}}\left(\bigcup_{i=1}^{M}\left\{X_{i}(k):1\leq k\leq N^{R^{*}}_{i}(t_{i})\right\}\right)\right).

Thus, for a sampling rule R~\widetilde{R} which examines the elements one by one, without missing the elements that it can not observe, as they remain stand by, one has

i=1M𝖤[Xi(τiR)]=i=1M𝖤[Xi(τiR~)],\sum_{i=1}^{M}\mathsf{E}\left[X_{i}\left(\tau^{R^{*}}_{i}\right)\right]=\sum_{i=1}^{M}\mathsf{E}\left[X_{i}\left(\tau^{\widetilde{R}}_{i}\right)\right], (13)

and

τiR~=NiR(τiR),a.s., i[M].\tau^{\widetilde{R}}_{i}=N^{R^{*}}_{i}\left(\tau^{R^{*}}_{i}\right),\quad\mbox{a.s., }\;\forall\;i\in[M]. (14)

Hence, for each i[M]i\in[M],

𝖤[Xi(τiR~)]\displaystyle\mathsf{E}\left[X_{i}\left(\tau^{\widetilde{R}}_{i}\right)\right] =𝖤[Xi(NiR(τiR))]\displaystyle=\mathsf{E}\left[X_{i}\left(N^{R^{*}}_{i}\left(\tau^{R^{*}}_{i}\right)\right)\right] (15)
𝖤[max1mNiR(τiR)Xi(m)].\displaystyle\leq\mathsf{E}\left[\max_{1\leq m\leq N^{R^{*}}_{i}\left(\tau^{R^{*}}_{i}\right)}X_{i}(m)\right].

Therefore, it suffices to show that

i=1M𝖤[max1mNiR(τiR)Xi(m)]\displaystyle\sum_{i=1}^{M}\mathsf{E}\left[\max_{1\leq m\leq N^{R^{*}}_{i}(\tau^{R^{*}}_{i})}X_{i}(m)\right] (16)
max(n1,,nM)(K)i=1M𝖤[max1mniXi(m)].\displaystyle\quad\leq\max_{(n_{1},\ldots,n_{M})\in\mathfrak{C}(K)}\sum_{i=1}^{M}\mathsf{E}\left[\max_{1\leq m\leq n_{i}}X_{i}(m)\right].

Indeed, by sampling constraint (2), and since τiRn\tau^{R^{*}}_{i}\leq n, for all i[M]i\in[M], we have

i=1MNiR(τiR)i=1Mm=1nRi(m)Kn.\sum_{i=1}^{M}N_{i}^{R^{*}}(\tau^{R^{*}}_{i})\leq\sum_{i=1}^{M}\sum_{m=1}^{n}R^{*}_{i}(m)\leq Kn.

Since each term in (6) is increasing in nin_{i}, we conclude (16). ∎

IV Maximization Problem

We focus on the computation of the maximizers n1,,nMn_{1}^{*},\ldots,n_{M}^{*} of (6). For large nn, finding the exact values is computationally demanding, and thus we suggest an approximation technique along with error guarantees.

For each i[M]i\in[M], we restrict our attention to absolutely continuous random variables, whose probability density and cumulative distribution functions are denoted by fif_{i} and FiF_{i}, respectively. The density of the maximum of nin_{i} observations from sequence ii is denoted by gig_{i}, i.e.,

gi(x):=ni(Fi(x))ni1fi(x),x0.g_{i}(x):=n_{i}\left(F_{i}(x)\right)^{n_{i}-1}f_{i}(x),\quad x\geq 0.

The maximization problem (6) turns into Problem 𝐏𝟏\mathbf{P_{1}}:

maxn1,,nMi=1Mni0x(Fi(x))ni1fi(x)𝑑x,\max_{n_{1},\ldots,n_{M}}\sum_{i=1}^{M}n_{i}\int_{0}^{\infty}x\left(F_{i}(x)\right)^{n_{i}-1}f_{i}(x)dx, (17)

subject to

i=1Mni=Kn,\displaystyle\sum_{i=1}^{M}n_{i}=Kn, (18)

and nni0n\geq n_{i}\geq 0, for all i[M]i\in[M].

A solution approach to 𝐏𝟏\mathbf{P_{1}} is the exhaustive search, of computational complexity O(nM)O\left(n^{M}\right). Thus, in the following subsection, we introduce a computationally simpler approximation method, whose computational complexity is O(2M)O\left(2^{M}\right), and hence independent of nn, with an approximation error converging to 0 as nn\to\infty.

IV-A The approximation problem

For each i[M]i\in[M], we set ci:=ni/nc_{i}:=n_{i}/n and rewrite problem 𝐏𝟏\mathbf{P_{1}} as Problem 𝐏𝟐\mathbf{P_{2}}:

maxc1,,cMi=1Mcin0x(Fi(x))cin1fi(x)𝑑x,\max_{c_{1},\ldots,c_{M}}\sum_{i=1}^{M}c_{i}n\int_{0}^{\infty}x\left(F_{i}(x)\right)^{c_{i}n-1}f_{i}(x)dx, (19)

subject to

i=1Mci=K,\sum_{i=1}^{M}c_{i}=K, (20)

and 1ci01\geq c_{i}\geq 0, for all i[M]i\in[M]. We denote by G(c1,,cM)G(c_{1},\ldots,c_{M}) the objective function (19), where (c1,,cM)(c_{1},\ldots,c_{M}) lies in the set

𝒟:={(c1,,cM)[0,1]M:i=1Mci=K}.\mathcal{D}:=\left\{(c_{1},\ldots,c_{M})\in[0,1]^{M}\,:\,\sum_{i=1}^{M}c_{i}=K\right\}.

For simplicity, we assume that the function G(c1,,cM)G(c_{1},\ldots,c_{M}) is differentiable everywhere on 𝒟\mathcal{D}. Thus, the function G(c1,,cM)G(c_{1},\ldots,c_{M}) is continuous everywhere on the compact set 𝒟\mathcal{D}, and by the Extreme value theorem [16, Theorem 4.16] it achieves a maximum on 𝒟\mathcal{D}. By the Lagrange multipliers method and provided that the induced system of equations has a solution, we obtain the solution of 𝐏𝟐\mathbf{P_{2}}, denoted by (c^1,,c^M)(\hat{c}_{1},\ldots,\hat{c}_{M}). Then, within 𝐏𝟏\mathbf{P_{1}}, we replace nni0n\geq n_{i}\geq 0 by

c^innic^in,i[M],\lfloor\hat{c}_{i}n\rfloor\leq n_{i}\leq\lceil\hat{c}_{i}n\rceil,\quad\forall\;i\in[M],

which reduces the number of possible solutions from n+1n+1 to 22, for all i[M]i\in[M]. Hence, we obtain Problem 𝐏𝟑\mathbf{P_{3}}:

maxn1,,nMi=1Mni0x(Fi(x))ni1fi(x)𝑑x,\max_{n_{1},\ldots,n_{M}}\sum_{i=1}^{M}n_{i}\int_{0}^{\infty}x\left(F_{i}(x)\right)^{n_{i}-1}f_{i}(x)dx, (21)

subject to

i=1Mni\displaystyle\sum_{i=1}^{M}n_{i} =Kn,\displaystyle=Kn, (22)
c^inni\displaystyle\lfloor\hat{c}_{i}n\rfloor\leq n_{i} c^in,i[M].\displaystyle\leq\lceil\hat{c}_{i}n\rceil,\quad\forall\;i\in[M].

Solving 𝐏𝟑\mathbf{P_{3}} using exhaustive search, we compute an approximation of the optimal solution of 𝐏𝟏\mathbf{P_{1}}.

IV-B The cost of approximation

We provide a bound on the approximation error for the sample sizes nin_{i}, which converges to 0 as nn increases. We denote by H(n1,,nM)H(n_{1},\ldots,n_{M}) the objective function of problems 𝐏𝟏\mathbf{P_{1}}, 𝐏𝟑\mathbf{P_{3}}, and by

  1. (i)

    (n1,,nM)(n^{*}_{1},\ldots,n^{*}_{M}) the optimal solution of problem 𝐏𝟏\mathbf{P_{1}}, with ci:=ni/nc^{*}_{i}:=n^{*}_{i}/n, for all i[M]i\in[M].

  2. (ii)

    (nˇ1,,nˇM)(\check{n}_{1},\ldots,\check{n}_{M}) the optimal solution of problem 𝐏𝟑\mathbf{P_{3}}, with cˇi:=nˇi/n\check{c}_{i}:=\check{n}_{i}/n, for all i[M]i\in[M].

We define the cost of approximation as

C:=|H(n1,,nM)H(nˇ1,,nˇM)|.C:=\big{|}H(n^{*}_{1},\ldots,n^{*}_{M})-H(\check{n}_{1},\ldots,\check{n}_{M})\big{|}. (23)

Next, we prove that under smoothness conditions on GG, the cost CC is bounded by a constant multiplied by ee,

e:=(c^1,,c^M)(cˇ1,,cˇM).e:=\|(\hat{c}_{1},\ldots,\hat{c}_{M})-(\check{c}_{1},\ldots,\check{c}_{M})\|. (24)
Theorem IV.1

If GG is everywhere differentiable on 𝒟\mathcal{D}, with bounded derivative, then CQeC\leq Q\,e, where

Q:=max{G(c1,..,cM):(c1,..,cM)𝒟}.Q:=\max\left\{\|\nabla G(c_{1},..,c_{M})\|:(c_{1},..,c_{M}){\in}\mathcal{D}\right\}. (25)
Proof:

By definition of the objective functions HH, GG, we observe that for any n1,,nM[n]n_{1},\ldots,n_{M}\in[n] it holds

G(n1n,,nMn)=H(n1,,nM),G\left(\frac{n_{1}}{n},\ldots,\frac{n_{M}}{n}\right)=H(n_{1},\ldots,n_{M}), (26)

which implies that

G(c1,..,cM)=\displaystyle G(c^{*}_{1},..,c^{*}_{M})= H(n1,..,nM)\displaystyle H(n^{*}_{1},..,n^{*}_{M}) (27)
H(nˇ1,..,nˇM)=G(cˇ1,..,cˇM),\displaystyle\geq H(\check{n}_{1},..,\check{n}_{M})=G(\check{c}_{1},..,\check{c}_{M}),

where the inequality follows by the optimality of (n1,,nM)(n^{*}_{1},\ldots,n^{*}_{M}) for problem 𝐏𝟏\mathbf{P_{1}}. Also, by the optimality of (c^1,,c^M)(\hat{c}_{1},\ldots,\hat{c}_{M}) for problem 𝐏𝟐\mathbf{P_{2}}, we have

G(c^1,,c^M)G(c1,,cM).G(\hat{c}_{1},\ldots,\hat{c}_{M})\geq G(c^{*}_{1},\ldots,c^{*}_{M}). (28)

By definition (23), and inequalities (27)-(28) we have

C|G(c^1,,c^M)G(cˇ1,,cˇM)|.C\leq\big{|}G(\hat{c}_{1},\ldots,\hat{c}_{M})-G(\check{c}_{1},\ldots,\check{c}_{M})\big{|}. (29)

Since GG is everywhere differentiable with bounded derivative on 𝒟\mathcal{D}, and 𝒟\mathcal{D} is convex, by Rademacher’s theorem [17, Theorem 1.41] we deduce that

G\displaystyle\|G (c^1,,c^M)G(cˇ1,,cˇM)\displaystyle(\hat{c}_{1},\ldots,\hat{c}_{M})-G(\check{c}_{1},\ldots,\check{c}_{M})\| (30)
Q(c^1,,c^M)(cˇ1,,cˇM),\displaystyle\leq Q\,\|(\hat{c}_{1},\ldots,\hat{c}_{M})-(\check{c}_{1},\ldots,\check{c}_{M})\|,

where QQ is as defined in (25). This proves the claim. ∎

Remark: By the second constraint in (22), it holds

c^i1/ncˇic^i+1/n,\hat{c}_{i}-1/n\leq\check{c}_{i}\leq\hat{c}_{i}+1/n, (31)

which implies that e0e\to 0 as nn\to\infty, and by Theorem IV.1, we conclude that C0C\to 0 as nn\to\infty.

IV-C Uniform densities example

We consider MM sequences of nn i.i.d. random variables each, which follow a U[ai,bi]U[a_{i},b_{i}] uniform distribution, where 0<ai<bi0<a_{i}<b_{i}, for each i[M]i\in[M]. We can observe only KK sequences at each time instant. In this case, problem 𝐏𝟐\mathbf{P_{2}} takes the form

maxc1,,cM{i=1Mbibiaicin+1},\max_{c_{1},\ldots,c_{M}}\left\{\sum_{i=1}^{M}b_{i}-\frac{b_{i}-a_{i}}{c_{i}n+1}\right\},

subject to its underlying constraints. By the Lagrange multipliers method, for each i[M]i\in[M], we have

c^i=biai(j=1Mbjaj)/M(KM+1n)1n.\hat{c}_{i}=\frac{\sqrt{b_{i}-a_{i}}}{\left(\sum_{j=1}^{M}\sqrt{b_{j}-a_{j}}\right)/M}\left(\frac{K}{M}+\frac{1}{n}\right)-\frac{1}{n}. (32)

Remark: For all i[M]i\in[M], the differences biaib_{i}-a_{i} determine the values of c^i\hat{c}_{i}, which implies that the variances (biai)2/12,(b_{i}-a_{i})^{2}/12, govern the sampling rates.

IV-D Gaussian densities example

In various frameworks, e.g. [5, Section 5.1], the sequences follow a Gaussian density 𝒩(μi,σi2)\mathcal{N}(\mu_{i},\sigma^{2}_{i}), for each i[M]i\in[M], and we can observe KK sequences at each instant. We assume that the means of the Gaussians are large enough, and the variances relatively small, so that the random variables are positive with probability practically equal to one. According to Blom’s formula [18], and the approximation of the erf\operatorname{erf} function presented in [19], for nin_{i} large enough, and for each i[M]i\in[M],

E[max1mniXi(m)]μi+σiπ2108(1ni+0.250.8).E\left[\max_{1\leq m\leq n_{i}}X_{i}(m)\right]{\simeq}\mu_{i}+\sigma_{i}\sqrt{\frac{\pi}{2}}\frac{10}{8}\left(\frac{1}{n_{i}+0.25}-0.8\right).

Thus, problem 𝐏𝟐\mathbf{P_{2}} reduces to

maxc1,,cMi=1Mσicin+0.25,\max_{c_{1},\ldots,c_{M}}\sum_{i=1}^{M}\frac{\sigma_{i}}{c_{i}\,n+0.25}, (33)

subject to its defining constraints. By Lagrange multipliers method, we obtain

c^i=σi(i=1nσi)/M(KM+0.25n),i[M].\hat{c}_{i}=\frac{\sqrt{\sigma_{i}}}{\left(\sum_{i=1}^{n}\sqrt{\sigma_{i}}\right)/M}\left(\frac{K}{M}+\frac{0.25}{n}\right),\;\;i\in[M]. (34)

We observe that except for the change in a constant multiplier of 1/n1/n, the c^i\hat{c}_{i} in (34) exhibit the same dependence on the variances as those in (32), especially for large nn.

V Computational examples

We compare the expected reward of the joint dynamic programming problem (3) versus that of the decoupled problem. For the decoupled problem, we precompute the optimal number of observations for each sequence, and then on each individual sequence, we run either (i) a single-sequence dynamic programming (Single-DP), or (ii) a Prophet Inequality thresholding (PI-thresholding) method [3, Corollary 4.7].

We consider M=3M=3 sequences that follow three different uniform distributions U[0,3]U[0,3], U[0.5,2.5]U[0.5,2.5], U[1,2]U[1,2], which have the same mean but different variances, and of lengths n=5,,10n=5,\ldots,10. We consider two cases K=1K=1 and K=2K=2, and we plot the ratio of the expected reward of the decoupled problem, for both Single-DP and PI-thresholding, over that of the joint problem.

For K=1K=1 the decoupling approach offers a very good approximation for the problem (3), with a ratio above 0.920.92 for the Single-DP case, and above 0.910.91 for the PI-thresholding, for all nn. We also note that the ratio of the PI-thresholding method is no more than 1%1\% smaller than that of the Single-DP method which is always optimal for a single sequence.

For K=2K=2 the ratio is smaller compared to the former case, but it is still a good approximation, with a ratio above 0.880.88 for the Single-DP, and above 0.870.87 for the PI-thresholding method, for all nn. In this case the gap between Single-DP and PI-thresholding is larger, but no more than 10%10\%.

Refer to caption
(a) M=3M=3, K=1K=1
Refer to caption
(b) M=3M=3, K=2K=2
Figure 1: Joint/Decoupled for Single-DP, PI-thresholding.

Acknowledgments

The work was supported in part by NSF awards 2008125 and 1956384, through Coordinated Science Laboratory, University of Illinois at Urbana-Champaign.

References

  • [1] S. Alaei, “Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers,” SIAM Journal on Computing, vol. 43, no. 2, pp. 930–972, 2014.
  • [2] M. Feldman, N. Gravin, and B. Lucier, “Combinatorial auctions via posted prices,” in Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms.   SIAM, 2014, pp. 123–135.
  • [3] J. Correa, P. Foncea, R. Hoeksma, T. Oosterwijk, and T. Vredeveld, “Posted price mechanisms for a random stream of customers,” in Proceedings of the 2017 ACM Conference on Economics and Computation, 2017, pp. 169–186.
  • [4] E. Lee and S. Singla, “Optimal online contention resolution schemes via ex-ante prophet inequalities,” in 26th Annual European Symposium on Algorithms, ESA 2018, August 20-22, 2018, Helsinki, Finland, vol. 112, 2018, pp. 57:1–57:14.
  • [5] Z. Huang, J. A. Joao, A. Rico, A. D. Hilton, and B. C. Lee, “Dynasprint: Microarchitectural sprints with dynamic utility and thermal management,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 426–439.
  • [6] M. Epitropou and R. Vohra, “Optimal on-line allocation rules with verification,” in Algorithmic Game Theory: 12th International Symposium, SAGT 2019, Athens, Greece, September 30–October 3, 2019, Proceedings 12.   Springer, 2019, pp. 3–17.
  • [7] R. N. Bhattacharya and E. C. Waymire, Random walk, Brownian motion, and martingales.   Springer, 2021.
  • [8] S. Nitinawarat, G. K. Atia, and V. V. Veeravalli, “Controlled sensing for multihypothesis testing,” IEEE Transactions on automatic control, vol. 58, no. 10, pp. 2451–2464, 2013.
  • [9] Q. Xu, Y. Mei, and G. V. Moustakides, “Optimum multi-stream sequential change-point detection with sampling control,” IEEE Transactions on Information Theory, vol. 67, no. 11, pp. 7627–7636, 2021.
  • [10] Q. Zhao, Multi-armed bandits: Theory and applications to online learning in networks.   Springer Nature, 2022.
  • [11] U. Krengel and L. Sucheston, “On semiamarts, amarts, and processes with finite value,” Probability on Banach spaces, vol. 4, pp. 197–266, 1978.
  • [12] D. Assaf and E. Samuel-Cahn, “Simple ratio prophet inequalities for a mortal with multiple choices,” Journal of applied probability, vol. 37, no. 4, pp. 1084–1091, 2000.
  • [13] J. Correa, R. Saona, and B. Ziliotto, “Prophet secretary through blind strategies,” Mathematical Programming, vol. 190, no. 1-2, pp. 483–521, 2021.
  • [14] A. Bubna and A. Chiplunkar, “Prophet inequality: Order selection beats random order,” in Proceedings of the 24th ACM Conference on Economics and Computation, 2023, pp. 302–336.
  • [15] R. P. Kertz, “Stop rule and supremum expectations of iid random variables: a complete comparison by conjugate duality,” Journal of multivariate analysis, vol. 19, no. 1, pp. 88–112, 1986.
  • [16] W. Rudin, Principles of mathematical analysis, 1953.
  • [17] N. Weaver, Lipschitz algebras.   World Scientific, 2018.
  • [18] J. Royston, “Algorithm as 177: Expected normal order statistics (exact and approximate),” Journal of the royal statistical society. Series C (Applied statistics), vol. 31, no. 2, pp. 161–165, 1982.
  • [19] D. Dominici, “Some properties of the inverse error function,” Contemporary Mathematics, vol. 457, pp. 191–204, 2008.