This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Naveen Jindal School of Management, University of Texas at Dallas 22institutetext: Department of Computer Science, University of North Texas 22email: shaojie.tang@utdallas.edu

Streaming Adaptive Submodular Maximization

Shaojie Tang 0000-0001-9261-5210    Jing Yuan 0000-0001-6407-834X
Abstract

Many sequential decision making problems can be formulated as an adaptive submodular maximization problem. However, most of existing studies in this field focus on pool-based setting, where one can pick items in any order, and there have been few studies for the stream-based setting where items arrive in an arbitrary order and one must immediately decide whether to select an item or not upon its arrival. In this paper, we introduce a new class of utility functions, semi-policywise submodular functions. We develop a series of effective algorithms to maximize a semi-policywise submodular function under the stream-based setting.

1 Introduction

Many machine learning and artificial intelligence tasks can be formulated as an adaptive sequential decision making problem. The goal of such a problem is to sequentially select a group of items, each selection is based on the past, in order to maximize some give utility function. It has been shown that in a wide range of applications, including active learning [6] and adaptive viral marketing [16], their utility functions satisfy the property of adaptive submodularity [6], a natural diminishing returns property under the adaptive setting. Several effective solutions have been developed for maximizing an adaptive submodular function subject to various practical constraints. For example, [6] developed a simple adaptive greedy policy that achieves a 11/e1-1/e approximation ratio for maximizing an adaptive monotone and adaptive submodular function subject to a cardinality constraint. Recently, [13] extends the aforementioned studies to the non-monotone setting and they propose a 1/e1/e approximated solution for maximizing a non-monotone adaptive submodular function subject to a cardinality constraint. In the same work, they develop a faster algorithm whose running time is linear in the number of items. [14] develops the first constant approximation algorithms subject to more general constraints such as knapsack constraint and kk-system constraint.

We note that most of existing studies focus on the pool-based setting where one is allowed to select items in any order. In this paper, we tackle this problem under the stream-based setting. Under our setting, items arrive one by one in an online fashion where the order of arrivals is decided by the adversary. Upon the arrival of an item, one must decide immediately whether to select that item or not. If this item is selected, then we are able to observe its realized state; otherwise, we skip this item and wait for the next item. Our goal is to adaptively select a group items in order to maximize the expected utility subject to a knapsack constraint. For solving this problem, we introduce the concept of semi-policywise submodularity, which is another adaptive extension of the classical notation of submodularity. We show that this property can be found in many real world applications such as active learning and adaptive viral marketing. We develop a series of simple adaptive policies for this problem and prove that if the utility function is semi-policywise submodular, then our policies achieve constant approximation ratios against the optimal pool-based policy. In particular, for a single cardinality constraint, we develop a stream-based policy that achieves an approximation ratio of 11/e4\frac{1-1/e}{4}. For a general knapsack constraint, we develop a stream-based policy that achieves an approximation ratio of 11/e16\frac{1-1/e}{16}.

2 Related Work

Stream-based submodular optimization

Non-adaptive submodular maximization under the stream-based setting has been extensively studied. For example, [2] develop the first efficient non-adaptive streaming algorithm SieveStreaming that achieves a 1/2ϵ1/2-\epsilon approximation ratio against the optimum solution. Their algorithm requires only a single pass through the data, and memory independent of data size. [10] develop an enhanced streaming algorithm which requires less memory than SieveStreaming. Very recently, [11] propose a new algorithm that works well under the assumption that a single function evaluation is very expensive. [5] extend the previous studies from the non-adaptive setting to the adaptive setting. They develop constant factor approximation solutions for their problem. However, they assume that items arrive in a random order, which is a large difference from our adversarial arrival model. Our work is also related to submodular prophet inequalities [3, 12]. Although they also consider an adversarial arrival model, their setting is different from ours in that 1. they assume items are independent and 2. they are allowed to observe an item’s state before selecting it.

Adaptive submodular maximization

[6] introduce the concept of adaptive submodularity that extends the notation of submodularity from sets to policies. They develop a simple adaptive greedy policy that achieves a 11/e1-1/e approximation ratio if the function is adaptive monotone and adaptive submodular. When the utility function is non-monotone, [13] show that a randomized greedy policy achieves a 1/e1/e approximation ratio subject to a cardinality constraint. Very recently, they generalize their previous study and develop the first constant approximation algorithms subject to more general constraints such as knapsack constraint and kk-system constraint [14]. Other variants of adaptive submodular maximization have been studied in [15, 18, 17, 19].

3 Preliminaries

3.1 Items

We consider a set EE of nn items. Each items eEe\in E belongs to a random state Φ(e)O\Phi(e)\in O where OO represents the set of all possible states. Denote by ϕ\phi a realization of Φ\Phi, i.e., for each eEe\in E, ϕ(e)\phi(e) is a realization of Φ(e)\Phi(e). In the application of experimental design, an item ee represents a test, such as the blood pressure, and Φ(e)\Phi(e) is the result of the test, such as, high. We assume that there is a known prior probability distribution p(ϕ)=Pr(Φ=ϕ)p(\phi)=\Pr(\Phi=\phi) over realizations ϕ\phi. The distribution pp completely factorizes if realizations are independent. However, we consider a general setting where the realizations are dependent. For any subset of items SES\subseteq E, we use ψ:SO\psi:S\rightarrow O to represent a partial realization and dom(ψ)=S\mathrm{dom}(\psi)=S is called the domain of ψ\psi. For any pair of a partial realization ψ\psi and a realization ϕ\phi, we say ϕ\phi is consistent with ψ\psi, denoted ϕψ\phi\sim\psi, if they are equal everywhere in dom(ψ)\mathrm{dom}(\psi). For any two partial realizations ψ\psi and ψ\psi^{\prime}, we say that ψ\psi is a subrealization of ψ\psi^{\prime}, and denoted by ψψ\psi\subseteq\psi^{\prime}, if dom(ψ)dom(ψ)\mathrm{dom}(\psi)\subseteq\mathrm{dom}(\psi^{\prime}) and they are consistent in dom(ψ)\mathrm{dom}(\psi). In addition, each item eEe\in E has a cost c(e)c(e). For any SES\subseteq E, let c(S)=eSc(e)c(S)=\sum_{e\in S}c(e) denote the total cost of SS.

3.2 Policies

In the stream-based setting, we assume that items arrive one by one in an adversarial order σ\sigma. A policy has to make an irrevocable decision on whether to select an item or not when an item arrives. If an item is selected, then we are able to observe its realized state; otherwise, we can not reveal its realized state. Formally, a stream-based policy is a partial mapping that maps a pair of partial realizations ψ\psi and an item ee to some distribution of {0,1}\{0,1\}: π:2E×OE×E𝒫({0,1})\pi:2^{E}\times O^{E}\times E\rightarrow\mathcal{P}(\{0,1\}), specifying whether to select the arriving item ee based on the current observation ψ\psi. For example, assume that the current observation is ψ\psi and the newly arrived item is ee, then π(ψ,e)=1\pi(\psi,e)=1 (resp. π(ψ,e)=0\pi(\psi,e)=0) indicates that π\pi selects (res. does not select) ee.

Assume that there is a utility function f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} which is defined over items and states. Letting E(π,ϕ,σ)E(\pi,\phi,\sigma) denote the subset of items selected by a stream-based policy π\pi conditioned on a realization ϕ\phi and a sequence of arrivals σ\sigma, the expected utility favg(π)f_{avg}(\pi) of a stream-based policy π\pi conditioned on a sequence of arrivals σ\sigma can be written as

𝔼[favg(π)σ]=𝔼Φp,Π[f(E(π,Φ,σ),Φ)]\displaystyle\mathbb{E}[f_{avg}(\pi)\mid\sigma]=\mathbb{E}_{\Phi\sim p,\Pi}[f(E(\pi,\Phi,\sigma),\Phi)]~{}

where the expectation is taken over all possible realizations Φ\Phi and the internal randomness of the policy π\pi.

We next introduce the concept of policy concatenation which will be used in our proofs.

Definition 1 (Policy Concatenation)

Given two policies π\pi and π\pi^{\prime}, let π@π\pi@\pi^{\prime} denote a policy that runs π\pi first, and then runs π\pi^{\prime}, ignoring the observation obtained from running π\pi.

3.2.1 Pool-based policy

When analyzing the performance of our stream-based policy, we compare our policy against the optimal pool-based policy which is allowed to select items in any order. Note that any stream-based policy can be viewed as a special case of pool-based policy, hence, an optimal pool-based policy can not perform worse than any optimal stream-based policy. By abuse of notation, we still use π\pi to represent a pool-based policy. Formally, a pool-based policy can be encoded as a partial mapping π\pi that maps partial realizations ψ\psi to some distribution of EE: π:2E×OE𝒫(E)\pi:2^{E}\times O^{E}\rightarrow\mathcal{P}^{\prime}(E). Intuitively, π(ψ)\pi(\psi) specifies which item to select next based on the current observation ψ\psi. Letting E(π,ϕ)E(\pi,\phi) denote the subset of items selected by a pool-based policy π\pi conditioned on a realization ϕ\phi, the expected utility favg(π)f_{avg}(\pi) of a pool-based policy π\pi can be written as

favg(π)=𝔼Φp,Π[f(E(π,Φ),Φ)]\displaystyle f_{avg}(\pi)=\mathbb{E}_{\Phi\sim p,\Pi}[f(E(\pi,\Phi),\Phi)]~{}

where the expectation is taken over all possible realizations Φ\Phi and the internal randomness of the policy π\pi. Note that if π\pi is a pool-based policy, then for any sequence of arrivals σ\sigma, favg(π)=𝔼[favg(π)σ]f_{avg}(\pi)=\mathbb{E}[f_{avg}(\pi)\mid\sigma]. This is because the output of a pool-based policy does not depend on the sequence of arrivals.

3.3 Problem Formulation and Additional Notations

Our objective is to find an stream-based policy that maximizes the worst-case expected utility subject to a budget constraint BB, i.e.,

maxπΩsminσ𝔼[favg(π)σ]\displaystyle\max_{\pi\in\Omega^{s}}\min_{\sigma}\mathbb{E}[f_{avg}(\pi)\mid\sigma] (1)

where Ωs={πϕ,σ:c(E(π,Φ,σ))B}\Omega^{s}=\{\pi\mid\forall\phi,\sigma^{\prime}:c(E(\pi,\Phi,\sigma^{\prime}))\leq B\} represents a set of all feasible stream-based policies subject to a knapsack constraint (c,B)(c,B). That is, a feasible policy must satisfy the budget constraint under all possible realizations and sequences of arrivals.

We next introduce some additional notations and important assumptions in order to facilitate our study.

Definition 2 (Conditional Expected Marginal Utility of an Item)

Given a utility function f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} , the conditional expected marginal utility Δ(eψ)\Delta(e\mid\psi) of an item ee on top of a partial realization ψ\psi is

Δ(eψ)=𝔼Φ[f(S{e},Φ)f(S,Φ)Φψ]\displaystyle\Delta(e\mid\psi)=\mathbb{E}_{\Phi}[f(S\cup\{e\},\Phi)-f(S,\Phi)\mid\Phi\sim\psi] (2)

where the expectation is taken over Φ\Phi with respect to p(ϕψ)=Pr(Φ=ϕΦψ)p(\phi\mid\psi)=\Pr(\Phi=\phi\mid\Phi\sim\psi).

Definition 3

[6][Adaptive Submodularity and Monotonicity] A function f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is adaptive submodular with respect to a prior p(ϕ)p(\phi) if for any two partial realization ψ\psi and ψ\psi^{\prime} such that ψψ\psi\subseteq\psi^{\prime} and any item eEdom(ψ)e\in E\setminus\mathrm{dom}(\psi^{\prime}),

Δ(eψ)Δ(eψ)\displaystyle\Delta(e\mid\psi)\geq\Delta(e\mid\psi^{\prime}) (3)

Moreover, if f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is adaptive monotone with respect to a prior p(ϕ)p(\phi), then we have Δ(eψ)0\Delta(e\mid\psi)\geq 0 for any partial realization ψ\psi and any item eEdom(ψ)e\in E\setminus\mathrm{dom}(\psi).

Definition 4 (Conditional Expected Marginal Utility of a Pool-based Policy)

Given a utility function f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0}, the conditional expected marginal utility Δ(πψ)\Delta(\pi\mid\psi) of a pool-based policy π\pi on top of partial realization ψ\psi is

Δ(πψ)=𝔼Φ,Π[f(E(π,Φ),Φ)f(dom(ψ),Φ)Φψ]\Delta(\pi\mid\psi)=\mathbb{E}_{\Phi,\Pi}[f(E(\pi,\Phi),\Phi)-f(\mathrm{dom}(\psi),\Phi)\mid\Phi\sim\psi]

where the expectation is taken over Φ\Phi with respect to p(ϕψ)=Pr(Φ=ϕΦψ)p(\phi\mid\psi)=\Pr(\Phi=\phi\mid\Phi\sim\psi) and the internal randomness of π\pi.

We next introduce a new class of stochastic functions.

Definition 5 (Semi-Policywise Submodularity)

A function f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is semi-policywise submodular with respect to a prior p(ϕ)p(\phi) and a knapsack constraint (c,B)(c,B) if for any partial realization ψ\psi,

favg(π)maxπΩpΔ(πψ)\displaystyle\ f_{avg}(\pi^{*})\geq\max_{\pi\in\Omega^{p}}\Delta(\pi\mid\psi) (4)

where Ωp\Omega^{p} denotes the set of all possible pool-based policies subject to a knapsack constraint (c,B)(c,B), i.e., Ωp={πϕ,c(E(π,ϕ))B}\Omega^{p}=\{\pi\mid\forall\phi,c(E(\pi,\phi))\leq B\}, and

πargmaxπΩpfavg(π)\pi^{*}\in\operatorname*{arg\,max}_{\pi\in\Omega^{p}}\ f_{avg}(\pi)

represents an optimal pool-based policy subject to (c,B)(c,B).

In the rest of this paper, we always assume that our utility function f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is adaptive monotone, adaptive submodular and semi-policywise submodular with respect to a prior p(ϕ)p(\phi) and a knapsack constraint (c,B)(c,B). In appendix, we show that this type of function can be found in a variety of important real world applications. All missing materials are moved to appendix.

4 Uniform Cost

We first study the case when all items have uniform costs, i.e., eE,c(e)=1\forall e\in E,c(e)=1. Without loss of generality, assume BB is some positive integer. To solve this problem, we extend the non-adaptive solution in [2] to the adaptive setting.

4.1 Algorithm Design

Algorithm 1 Online Adaptive Policy πc\pi^{c}
1:  S=;i=1;t=1;ψ1=S=\emptyset;i=1;t=1;\psi_{1}=\emptyset.
2:  while ini\leq n and |S|<B|S|<B do
3:     if Δ(σ(i)ψt)v2B\Delta(\sigma(i)\mid\psi_{t})\geq\frac{v}{2B} then
4:        SS{σ(i)}S\leftarrow S\cup\{\sigma(i)\}; ψt+1ψt{(σ(i),Φ(σ(i)))}\psi_{t+1}\leftarrow\psi_{t}\cup\{(\sigma(i),\Phi(\sigma(i)))\}; tt+1t\leftarrow t+1;
5:     i=i+1i=i+1;
6:  return  SS

Recall that πargmaxπΩpfavg(π)\pi^{*}\in\operatorname*{arg\,max}_{\pi\in\Omega^{p}}\ f_{avg}(\pi) represents an optimal pool-based policy subject to a budget constraint BB, suppose we can estimate favg(π)f_{avg}(\pi^{*}) approximately, i.e., we know a value vv such that βfavg(π)vαfavg(π)\beta\cdot f_{avg}(\pi^{*})\geq v\geq\alpha\cdot f_{avg}(\pi^{*}) for some α[0,1]\alpha\in[0,1] and β[1,2]\beta\in[1,2]. Our policy, called Online Adaptive Policy πc\pi^{c}, starts with an empty set S=S=\emptyset. In each subsequent iteration ii, after observing an arriving item σ(i)\sigma(i), πc\pi^{c} adds σ(i)\sigma(i) to SS if the marginal value of σ(i)\sigma(i) on top of the current partial realization ψt\psi_{t} is at least v2B\frac{v}{2B}; otherwise, it skips σ(i)\sigma(i). This process iterates until there are no more arriving items or it reaches the cardinality constraint. A detailed description of πc\pi^{c} is listed in Algorithm 1.

4.2 Performance Analysis

We present the main result of this section in the following theorem.

Theorem 4.1

Assuming that we know a value vv such that βfavg(π)vαfavg(π)\beta\cdot f_{avg}(\pi^{*})\geq v\geq\alpha\cdot f_{avg}(\pi^{*}) for some β[1,2]\beta\in[1,2] and α[0,1]\alpha\in[0,1], we have 𝔼[favg(πc)σ]min{α4,2β4}favg(π)\mathbb{E}[f_{avg}(\pi^{c})\mid\sigma]\geq\min\{\frac{\alpha}{4},\frac{2-\beta}{4}\}f_{avg}(\pi^{*}) for any sequence of arrivals σ\sigma.

4.3 Offline Estimation of favg(π)f_{avg}(\pi^{*})

Recall that the design of πc\pi^{c} requires that we know a good approximation of favg(π)f_{avg}(\pi^{*}). We next explain how to obtain such an estimation. It is well known that a simple greedy pool-based policy πg\pi^{g} (which is outlined in Algorithm 2) provides a (11/e)(1-1/e) approximation for the pool-based adaptive submodular maximization problem subject to a cardinality constraint [6], i.e., favg(πg)(11/e)favg(π)f_{avg}(\pi^{g})\geq(1-1/e)f_{avg}(\pi^{*}). Hence, favg(πg)f_{avg}(\pi^{g}) is a good approximation of favg(π)f_{avg}(\pi^{*}). In particular, if we set v=favg(πg)v=f_{avg}(\pi^{g}), then we have favg(π)v(11/e)favg(π)f_{avg}(\pi^{*})\geq v\geq(1-1/e)f_{avg}(\pi^{*}). This, together with Theorem 5.1, implies that πc\pi^{c} achieves a 11/e4\frac{1-1/e}{4} approximation ratio against π\pi^{*}. One can estimate the value of favg(πg)f_{avg}(\pi^{g}) by simulating πg\pi^{g} on every possible realization ϕ\phi to obtain E(πg,ϕ)E(\pi^{g},\phi) and letting favg(πg)=ϕp(ϕ)f(E(πg,ϕ),ϕ)f_{avg}(\pi^{g})=\sum_{\phi}p(\phi)f(E(\pi^{g},\phi),\phi). When the number of possible realizations is large, one can sample a set of realizations according to p(ϕ)p(\phi) then run the simulation. Although obtaining a good estimation of favg(πg)f_{avg}(\pi^{g}) may be time consuming, this only needs to be done once in an offline manner. Thus, it does not contribute to the running time of the online implementation of πc\pi^{c}.

Algorithm 2 Offline Adaptive Greedy Policy πg\pi^{g}
1:  S=;t=1;ψ1=S=\emptyset;t=1;\psi_{1}=\emptyset.
2:  while tBt\leq B do
3:     let e=argmaxeEΔ(eψt)e^{\prime}=\operatorname*{arg\,max}_{e\in E}\Delta(e\mid\psi_{t});
4:     SS{e}S\leftarrow S\cup\{e^{\prime}\}; ψt+1ψt{(e,Φ(e))}\psi_{t+1}\leftarrow\psi_{t}\cup\{(e^{\prime},\Phi(e^{\prime}))\}; tt+1t\leftarrow t+1;
5:  return  SS

5 Nonuniform Cost

We next study the general case when items have nonuniform costs.

5.1 Algorithm Design

Algorithm 3 Online Adaptive Policy with Nonuniform Cost πk\pi^{k}
1:  S=;t=1;i=1;ψ1=S=\emptyset;t=1;i=1;\psi_{1}=\emptyset.
2:  while ini\leq n do
3:     if Δ(σ(i)ψt)c(σ(i))v2B\frac{\Delta(\sigma(i)\mid\psi_{t})}{c(\sigma(i))}\geq\frac{v}{2B} then
4:        if eSc(e)+c(σ(i))>B\sum_{e\in S}c(e)+c(\sigma(i))>B then
5:           break;
6:        else
7:           SS{σ(i)}S\leftarrow S\cup\{\sigma(i)\}; ψt+1ψt{(σ(i),Φ(σ(i)))}\psi_{t+1}\leftarrow\psi_{t}\cup\{(\sigma(i),\Phi(\sigma(i)))\}; tt+1t\leftarrow t+1;
8:     i=i+1i=i+1
9:  return  SS

Suppose we can estimate favg(π)f_{avg}(\pi^{*}) approximately, i.e., we know a value vv such that βfavg(π)vαfavg(π)\beta\cdot f_{avg}(\pi^{*})\geq v\geq\alpha\cdot f_{avg}(\pi^{*}) for some α[0,1]\alpha\in[0,1] and β[1,2]\beta\in[1,2]. For each eEe\in E, let f(e)f(e) denote 𝔼Φ[f({e},Φ)]\mathbb{E}_{\Phi}[f(\{e\},\Phi)] for short. Our policy randomly selects a solution from {e}\{e^{*}\} and πk\pi^{k} with equal probability, where e=argmaxeEf(e)e^{*}=\operatorname*{arg\,max}_{e\in E}f(e) is the best singleton and πk\pi^{k}, which is called Online Adaptive Policy with Nonuniform Cost, is a density-greedy policy. Hence, the expected utility of our policy is (f(e)+𝔼[favg(πk)σ])/2(f(e^{*})+\mathbb{E}[f_{avg}(\pi^{k})\mid\sigma])/2 for any given sequence of arrivals σ\sigma. We next explain the design of πk\pi^{k}. πk\pi^{k} starts with an empty set S=S=\emptyset. In each subsequent iteration ii, after observing an arriving item σ(i)\sigma(i), it adds σ(i)\sigma(i) to SS if the marginal value per unit budget of σ(i)\sigma(i) on top of the current realization ψt\psi_{t} is at least v2B\frac{v}{2B}, i.e., Δ(σ(i)ψt)c(σ(i))v2B\frac{\Delta(\sigma(i)\mid\psi_{t})}{c(\sigma(i))}\geq\frac{v}{2B}, and adding σ(i)\sigma(i) to SS does not violate the budget constraint; otherwise, if Δ(σ(i)ψt)c(σ(i))<v2B\frac{\Delta(\sigma(i)\mid\psi_{t})}{c(\sigma(i))}<\frac{v}{2B}, πk\pi^{k} skips σ(i)\sigma(i). This process iterates until there are no more arriving items or it reaches the first item (excluded) that violates the budget constraint. A detailed description of πk\pi^{k} is listed in Algorithm 3.

5.2 Performance Analysis

Before presenting the main theorem, we first introduce a technical lemma.

Lemma 1

Assuming that we know a value vv such that βfavg(π)vαfavg(π)\beta\cdot f_{avg}(\pi^{*})\geq v\geq\alpha\cdot f_{avg}(\pi^{*}) for some α[0,1]\alpha\in[0,1] and β[1,2]\beta\in[1,2], we have max{f(e),𝔼[favg(πk)σ]}min{α4,2β4}favg(π)\max\{f(e^{*}),\mathbb{E}[f_{avg}(\pi^{k})\mid\sigma]\}\geq\min\{\frac{\alpha}{4},\frac{2-\beta}{4}\}f_{avg}(\pi^{*}) for any sequence of arrivals σ\sigma.

Proof: We first introduce an auxiliary policy πk+\pi^{k+} that follows the same procedure of πk\pi^{k} except that πk+\pi^{k+} is allowed to add the first item that violates the budget constraint. Although πk+\pi^{k+} is not necessarily feasible, we next show that the expected utility 𝔼[favg(πk+)σ]\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma] of πk+\pi^{k+} is upper bounded by max{f(e),𝔼[favg(πk)σ]}\max\{f(e^{*}),\mathbb{E}[f_{avg}(\pi^{k})\mid\sigma]\} for any sequence of arrivals σ\sigma, i.e., 𝔼[favg(πk+)σ]max{f(e),𝔼[favg(πk)σ]}\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]\leq\max\{f(e^{*}),\mathbb{E}[f_{avg}(\pi^{k})\mid\sigma]\}.

Proposition 1

For any sequence of arrivals σ\sigma,

𝔼[favg(πk+)σ]max{f(e),𝔼[favg(πk)σ]}\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]\leq\max\{f(e^{*}),\mathbb{E}[f_{avg}(\pi^{k})\mid\sigma]\}

Proposition 1, whose proof is deferred to appendix, implies that to prove this lemma, it suffices to show that 𝔼[favg(πk+)σ]min{α4,2β4}favg(π)\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]\geq\min\{\frac{\alpha}{4},\frac{2-\beta}{4}\}f_{avg}(\pi^{*}). The rest of the analysis is devoted to proving this inequality for any fixed sequence of arrivals σ\sigma. We use λ={ψ1λ,ψ2λ,ψ3λ,,ψzλλ}\lambda=\{\psi_{1}^{\lambda},\psi_{2}^{\lambda},\psi_{3}^{\lambda},\cdots,\psi_{z^{\lambda}}^{\lambda}\} to denote a fixed run of πk+\pi^{k+}, where ψtλ\psi_{t}^{\lambda} is the partial realization of the first tt selected items and zλz^{\lambda} is the total number of selected items under λ\lambda. Let U={λPr[λ]>0}U=\{\lambda\mid\Pr[\lambda]>0\} represent all possible stories of running πk+\pi^{k+}, U+U^{+} represent those stories where πk+\pi^{k+} meets or violates the budget, i.e., U+={λUc(dom(ψzλλ))B}U^{+}=\{\lambda\in U\mid c(\mathrm{dom}(\psi_{z^{\lambda}}^{\lambda}))\geq B\}, and UU^{-} represent those stories where πk+\pi^{k+} does not use up the budget, i.e., U={λUc(dom(ψzλλ))<B}U^{-}=\{\lambda\in U\mid c(\mathrm{dom}(\psi_{z^{\lambda}}^{\lambda}))<B\}. Therefore, U=U+UU=U^{+}\cup U^{-}. For each λ\lambda and t[zλ]t\in[z^{\lambda}], let etλe^{\lambda}_{t} denote the tt-th selected item under λ\lambda. Define ψ0λ=\psi_{0}^{\lambda}=\emptyset for any λ\lambda. Using the above notations, we can represent 𝔼[favg(πk+)σ]\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma] as follows:

𝔼[favg(πk+)σ]=λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]=\sum_{\lambda\in U}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (5)
=λU+Pr[λ](t[zλ]Δ(etλψt1λ))I+λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle=\underbrace{\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda}))}_{I}+\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (6)

Then we consider two cases. We first consider the case when λU+Pr[λ]1/2\sum_{\lambda\in U^{+}}\Pr[\lambda]\geq 1/2 and show that the value of II is lower bounded by α4favg(π)\frac{\alpha}{4}f_{avg}(\pi^{*}). According to the definition of U+U^{+}, we have t[zλ]c(etλ)B\sum_{t\in[z^{\lambda}]}c(e^{\lambda}_{t})\geq B for any λU+\lambda\in U^{+}. Moreover, recall that for all t[zλ]t\in[z^{\lambda}], Δ(etλψt1λ)c(etλ)v2B\frac{\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})}{c(e^{\lambda}_{t})}\geq\frac{v}{2B} due to the design of our algorithm. Therefore, for any λU+\lambda\in U^{+},

t[zλ]Δ(etλψt1λ)v2B×B=v2\displaystyle\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})\geq\frac{v}{2B}\times B=\frac{v}{2} (7)

Because we assume that λU+Pr[λ]1/2\sum_{\lambda\in U^{+}}\Pr[\lambda]\geq 1/2, we have

λU+Pr[λ](t[zλ]Δ(etλψt1λ))(λU+Pr[λ])×v2v4α4favg(π)\displaystyle\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda}))\geq(\sum_{\lambda\in U^{+}}\Pr[\lambda])\times\frac{v}{2}\geq\frac{v}{4}\geq\frac{\alpha}{4}f_{avg}(\pi^{*}) (8)

The first inequality is due to (7) and the third inequality is due to the assumption that vαfavg(π)v\geq\alpha\cdot f_{avg}(\pi^{*}). We conclude that the value of II (and thus 𝔼[favg(πk+)σ]\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]) is no less than α4favg(π)\frac{\alpha}{4}f_{avg}(\pi^{*}), i.e.,

𝔼[favg(πk+)σ]α4favg(π)\displaystyle\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]\geq\frac{\alpha}{4}f_{avg}(\pi^{*}) (9)

We next consider the case when λU+Pr[λ]<1/2\sum_{\lambda\in U^{+}}\Pr[\lambda]<1/2. We show that under this case,

𝔼[favg(πk+)σ]2β4favg(π)\displaystyle\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]\geq\frac{2-\beta}{4}f_{avg}(\pi^{*}) (10)

Because f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is adaptive monotone, we have 𝔼[favg(πk+@π)σ]favg(π)\mathbb{E}[f_{avg}(\pi^{k+}@\pi^{*})\mid\sigma]\geq f_{avg}(\pi^{*}). To prove (10), it suffices to show that

𝔼[favg(πk+)σ]2β4𝔼[favg(πk+@π)σ]\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]\geq\frac{2-\beta}{4}\mathbb{E}[f_{avg}(\pi^{k+}@\pi^{*})\mid\sigma]

Observe that we can represent the gap between favg(πk+@π)f_{avg}(\pi^{k+}@\pi^{*}) and favg(π)f_{avg}(\pi^{*}) conditioned on σ\sigma as follows:

𝔼[favg(πk+@π)favg(πk+)σ]=λUPr[λ](t[zλ]Δ(πψzλλ))\displaystyle\mathbb{E}[f_{avg}(\pi^{k+}@\pi^{*})-f_{avg}(\pi^{k+})\mid\sigma]=\sum_{\lambda\in U}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda})) (11)
=λU+Pr[λ](t[zλ]Δ(πψzλλ))II+λUPr[λ](t[zλ]Δ(πψzλλ))III\displaystyle=\underbrace{\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda}))}_{II}+\underbrace{\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda}))}_{III} (12)

Because f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is semi-policywise submodular with respect to p(ϕ)p(\phi) and (c,B)(c,B), we have maxπΩpΔ(πψzλλ)favg(π)\max_{\pi\in\Omega^{p}}\Delta(\pi\mid\psi_{z^{\lambda}}^{\lambda})\leq f_{avg}(\pi^{*}). Moreover, because Δ(πψzλλ)maxπΩpΔ(πψzλλ)\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda})\leq\max_{\pi\in\Omega^{p}}\Delta(\pi\mid\psi_{z^{\lambda}}^{\lambda}), we have

Δ(πψzλλ)favg(π)\displaystyle\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda})\leq f_{avg}(\pi^{*}) (13)

It follows that

II=λU+Pr[λ](t[zλ]Δ(πψzλλ))(λU+Pr[λ])favg(π)\displaystyle II=\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda}))\leq(\sum_{\lambda\in U^{+}}\Pr[\lambda])f_{avg}(\pi^{*}) (14)

Next, we show that IIIIII is upper bounded by (λUPr[λ])β2favg(π)(\sum_{\lambda\in U^{-}}\Pr[\lambda])\frac{\beta}{2}f_{avg}(\pi^{*}). For any ψzλλ\psi_{z^{\lambda}}^{\lambda}, we number all items eEe\in E by decreasing ratio Δ(eψzλλ)c(e)\frac{\Delta(e\mid\psi_{z^{\lambda}}^{\lambda})}{c(e)}, i.e., e(1)argmaxeEΔ(eψzλλ)c(e)e(1)\in\arg\max_{e\in E}\frac{\Delta(e\mid\psi_{z^{\lambda}}^{\lambda})}{c(e)}. Let l=min{ij=1ic(e(i))B}l=\min\{i\in\mathbb{N}\mid\sum_{j=1}^{i}c(e(i))\geq B\}. Define D(ψzλλ)={e(i)Ei[l]}D(\psi_{z^{\lambda}}^{\lambda})=\{e(i)\in E\mid i\in[l]\} as the set containing the first ll items. Intuitively, D(ψzλλ)D(\psi_{z^{\lambda}}^{\lambda}) represents a set of best-looking items conditional on ψzλλ\psi_{z^{\lambda}}^{\lambda}. Consider any eD(ψzλλ)e\in D(\psi_{z^{\lambda}}^{\lambda}), assuming ee is the ii-th item in D(ψzλλ)D(\psi_{z^{\lambda}}^{\lambda}), let

x(e,ψzλλ)=min{1,Bsj[i1]{e(j)}c(s)c(e)}x(e,\psi_{z^{\lambda}}^{\lambda})=\min\{1,\frac{B-\sum_{s\in\cup_{j\in[i-1]}\{e(j)\}}c(s)}{c(e)}\}~{}

where j[i1]{e(j)}\cup_{j\in[i-1]}\{e(j)\} represents the first i1i-1 items in D(ψzλλ)D(\psi_{z^{\lambda}}^{\lambda}).

In analogy to Lemma 1 of [9],

eD(ψzλλ)x(e,ψzλλ)Δ(eψzλλ)Δ(πψzλλ)\displaystyle\sum_{e\in D(\psi_{z^{\lambda}}^{\lambda})}x(e,\psi_{z^{\lambda}}^{\lambda})\Delta(e\mid\psi_{z^{\lambda}}^{\lambda})\geq\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda}) (15)

Note that for every λU\lambda\in U^{-}, we have t[zλ]c(etλ)<B\sum_{t\in[z^{\lambda}]}c(e^{\lambda}_{t})<B, that is, πk+\pi^{k+} does not use up the budget under λ\lambda. This, together with the design of πk+\pi^{k+}, indicates that for any eEe\in E, its benefit-to-cost ratio on top of ψzλλ\psi_{z^{\lambda}}^{\lambda} is less than v2B\frac{v}{2B}, i.e., Δ(eψzλλ)c(e)<v2B\frac{\Delta(e\mid\psi_{z^{\lambda}}^{\lambda})}{c(e)}<\frac{v}{2B}. Therefore,

eD(ψzλλ)x(e,ψ)Δ(eψzλλ)B×v2B=v2\displaystyle\sum_{e\in D(\psi_{z^{\lambda}}^{\lambda})}x(e,\psi)\Delta(e\mid\psi_{z^{\lambda}}^{\lambda})\leq B\times\frac{v}{2B}=\frac{v}{2} (16)

(15) and (16) imply that

Δ(πψzλλ)v2\displaystyle\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda})\leq\frac{v}{2} (17)

We next provide an upper bound of IIIIII,

III\displaystyle III =\displaystyle= λUPr[λ](t[zλ]Δ(πψzλλ))(λUPr[λ])v2\displaystyle\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda}))\leq(\sum_{\lambda\in U^{-}}\Pr[\lambda])\frac{v}{2} (18)
\displaystyle\leq (λUPr[λ])β2favg(π)\displaystyle(\sum_{\lambda\in U^{-}}\Pr[\lambda])\frac{\beta}{2}f_{avg}(\pi^{*}) (19)

where the first inequality is due to (17) and the second inequality is due to vβfavg(π)v\leq\beta\cdot f_{avg}(\pi^{*}).

Now we are in position to bound the value of 𝔼[favg(πk+@π)favg(πk+)σ]\mathbb{E}[f_{avg}(\pi^{k+}@\pi^{*})-f_{avg}(\pi^{k+})\mid\sigma],

𝔼[favg(πk+@π)\displaystyle\mathbb{E}[f_{avg}(\pi^{k+}@\pi^{*})- favg(πk+)σ]=II+III\displaystyle f_{avg}(\pi^{k+})\mid\sigma]=II+III (20)
\displaystyle\leq (λU+Pr[λ])favg(π)+(λUPr[λ])β2favg(π)\displaystyle(\sum_{\lambda\in U^{+}}\Pr[\lambda])f_{avg}(\pi^{*})+(\sum_{\lambda\in U^{-}}\Pr[\lambda])\frac{\beta}{2}f_{avg}(\pi^{*}) (21)
\displaystyle\leq 12favg(π)+12×β2favg(π)\displaystyle\frac{1}{2}f_{avg}(\pi^{*})+\frac{1}{2}\times\frac{\beta}{2}f_{avg}(\pi^{*}) (22)
=\displaystyle= 2+β4favg(π)\displaystyle\frac{2+\beta}{4}f_{avg}(\pi^{*}) (23)

The first inequality is due to (14) and (19). The second inequality is due to λU+Pr[λ]+λUPr[λ]=1\sum_{\lambda\in U^{+}}\Pr[\lambda]+\sum_{\lambda\in U^{-}}\Pr[\lambda]=1 and the assumptions that λU+Pr[λ]<1/2\sum_{\lambda\in U^{+}}\Pr[\lambda]<1/2 and β[1,2]\beta\in[1,2]. Because 𝔼[favg(πk+@π)σ]𝔼[favg(π)σ]\mathbb{E}[f_{avg}(\pi^{k+}@\pi^{*})\mid\sigma]\geq\mathbb{E}[f_{avg}(\pi^{*})\mid\sigma], which is due to f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is adaptive monotone, we have

𝔼[favg(π)favg(πk+)σ]\displaystyle\mathbb{E}[f_{avg}(\pi^{*})-f_{avg}(\pi^{k+})\mid\sigma] \displaystyle\leq 𝔼[favg(πk+@π)favg(πk+)σ]\displaystyle\mathbb{E}[f_{avg}(\pi^{k+}@\pi^{*})-f_{avg}(\pi^{k+})\mid\sigma] (24)
\displaystyle\leq 2+β4favg(π)\displaystyle\frac{2+\beta}{4}f_{avg}(\pi^{*}) (25)

where the second inequality is due to (23). This, together with the fact that 𝔼[favg(π)σ]=favg(π)\mathbb{E}[f_{avg}(\pi^{*})\mid\sigma]=f_{avg}(\pi^{*}), i.e., the optimal pool-based policy is not dependent on the sequence of arrivals, implies (10).

Combining the above two cases ((9) and (10)), we have

𝔼[favg(πk+)σ]min{α4,2β4}favg(π)\displaystyle\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]\geq\min\{\frac{\alpha}{4},\frac{2-\beta}{4}\}f_{avg}(\pi^{*}) (26)

This, together with Proposition 1, immedinately conclues this lemma. \Box

Recall that our final policy randomly picks a solution from {e}\{e^{*}\} and πk\pi^{k} with equal probability, thus, its expected utility is f(e)+𝔼[favg(πk)σ]2\frac{f(e^{*})+\mathbb{E}[f_{avg}(\pi^{k})\mid\sigma]}{2} which is lower bounded by max{f(e),𝔼[favg(πk)σ]}2\frac{\max\{f(e^{*}),\mathbb{E}[f_{avg}(\pi^{k})\mid\sigma]\}}{2}. This, together with Lemma 1, implies the following main theorem.

Theorem 5.1

If we randomly pick a solution from {e}\{e^{*}\} and πk\pi^{k} with equal probability, then it achieves a min{α8,2β8}\min\{\frac{\alpha}{8},\frac{2-\beta}{8}\} approximation ratio against the optimal pool-based policy π\pi^{*}.

5.3 Offline Estimation of favg(π)f_{avg}(\pi^{*})

Algorithm 4 Offline Greedy Policy with Nonuniform Cost πgn\pi^{gn}
1:  S=;t=1;ψ1=S=\emptyset;t=1;\psi_{1}=\emptyset.
2:  while tBt\leq B do
3:     let e=argmaxeEΔ(eψt)c(e)e^{\prime}=\operatorname*{arg\,max}_{e\in E}\frac{\Delta(e\mid\psi_{t})}{c(e)};
4:     if eSc(e)+c(e)>B\sum_{e\in S}c(e)+c(e^{\prime})>B then
5:        break;
6:     SS{e}S\leftarrow S\cup\{e^{\prime}\}; ψt+1ψt{(e,Φ(e))}\psi_{t+1}\leftarrow\psi_{t}\cup\{(e^{\prime},\Phi(e^{\prime}))\}; tt+1t\leftarrow t+1;
7:  return  SS

To complete the design of πk\pi^{k}, we next explain how to estimate the utility of the optimal pool-based policy favg(π)f_{avg}(\pi^{*}). It has been shown that the better solution between {e}\{e^{*}\} and a pool-based density-greedy policy πgn\pi^{gn} (Algorithm 4) achieves a (11/e)/2(1-1/e)/2 approximation for the pool-based adaptive submodular maximization problem subject to a knapsack constraint [20], i.e., max{favg(πgn),f(e)}11/e2favg(π)\max\{f_{avg}(\pi^{gn}),f(e^{*})\}\geq\frac{1-1/e}{2}f_{avg}(\pi^{*}). If we set v=max{favg(πgn),f(e)}v=\max\{f_{avg}(\pi^{gn}),f(e^{*})\} in πk\pi^{k}, then we have α=(11/e)/2\alpha=(1-1/e)/2 and β=1\beta=1. This, together with Theorem 5.1, implies that πc\pi^{c} achieves a 11/e16\frac{1-1/e}{16} approximation ratio against π\pi^{*}. One can estimate the value of favg(πgn)f_{avg}(\pi^{gn}) by simulating πgn\pi^{gn} on every possible realization ϕ\phi to obtain E(πgn,ϕ)E(\pi^{gn},\phi) and letting favg(πgn)=ϕp(ϕ)f(E(πgn,ϕ),ϕ)f_{avg}(\pi^{gn})=\sum_{\phi}p(\phi)f(E(\pi^{gn},\phi),\phi). To estimate the value of f(e)f(e^{*}), one can compute the value of f(e)f(e) using f(e)=ϕp(ϕ)f({e},ϕ)f(e)=\sum_{\phi}p(\phi)f(\{e\},\phi) for all eEe\in E, then return the best result as f(e)f(e^{*}).

References

  • [1] Adibi, A., Mokhtari, A., Hassani, H.: Submodular meta-learning. Advances in Neural Information Processing Systems 33 (2020)
  • [2] Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: Massive data summarization on the fly. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 671–680 (2014)
  • [3] Chekuri, C., Livanos, V.: On submodular prophet inequalities and correlation gap. arXiv preprint arXiv:2107.03662 (2021)
  • [4] Cuong, N.V., Lee, W.S., Ye, N., Chai, K.M., Chieu, H.L.: Active learning for probabilistic hypotheses using the maximum gibbs error criterion. Advances in Neural Information Processing Systems 26 (NIPS 2013) pp. 1457–1465 (2013)
  • [5] Fujii, K., Kashima, H.: Budgeted stream-based active learning via adaptive submodular maximization. In: NIPS. vol. 16, pp. 514–522 (2016)
  • [6] Golovin, D., Krause, A.: Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research 42, 427–486 (2011)
  • [7] Golovin, D., Krause, A., Ray, D.: Near-optimal bayesian active learning with noisy observations. In: NIPS (2010)
  • [8] Gonen, A., Sabato, S., Shalev-Shwartz, S.: Efficient active learning of halfspaces: an aggressive approach. In: International Conference on Machine Learning. pp. 480–488. PMLR (2013)
  • [9] Gotovos, A., Karbasi, A., Krause, A.: Non-monotone adaptive submodular maximization. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
  • [10] Kazemi, E., Mitrovic, M., Zadimoghaddam, M., Lattanzi, S., Karbasi, A.: Submodular streaming in all its glory: Tight approximation, minimum memory and low adaptive complexity. In: International Conference on Machine Learning. pp. 3311–3320. PMLR (2019)
  • [11] Kuhnle, A.: Quick streaming algorithms for maximization of monotone submodular functions in linear time. In: International Conference on Artificial Intelligence and Statistics. pp. 1360–1368. PMLR (2021)
  • [12] Rubinstein, A., Singla, S.: Combinatorial prophet inequalities. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 1671–1687. SIAM (2017)
  • [13] Tang, S.: Beyond pointwise submodularity: Non-monotone adaptive submodular maximization in linear time. Theoretical Computer Science 850, 249–261 (2021)
  • [14] Tang, S.: Beyond pointwise submodularity: Non-monotone adaptive submodular maximization subject to knapsack and kk-system constraints. In: 4th international conference on “Modelling, Computation and Optimization in Information Systems and Management Sciences” (2021)
  • [15] Tang, S.: Robust adaptive submodular maximization. CoRR abs/2107.11333 (2021), https://arxiv.org/abs/2107.11333
  • [16] Tang, S., Yuan, J.: Influence maximization with partial feedback. Operations Research Letters 48(1), 24–28 (2020)
  • [17] Tang, S., Yuan, J.: Adaptive regularized submodular maximization. In: 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2021)
  • [18] Tang, S., Yuan, J.: Non-monotone adaptive submodular meta-learning. In: SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21). pp. 57–65. SIAM (2021)
  • [19] Tang, S., Yuan, J.: Optimal sampling gaps for adaptive submodular maximization. In: AAAI (2022)
  • [20] Yuan, J., Tang, S.J.: Adaptive discount allocation in social networks. In: Proceedings of the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing. pp. 1–10 (2017)

6 Appendix

6.1 Proof of Theorem 5.1

Our proof is conducted conditional on a fixed sequence of arrivals σ\sigma. Let λ={ψ1λ,ψ2λ,ψ3λ,,ψzλλ}\lambda=\{\psi_{1}^{\lambda},\psi_{2}^{\lambda},\psi_{3}^{\lambda},\cdots,\psi_{z^{\lambda}}^{\lambda}\} denote a fixed run of πc\pi^{c}, where ψtλ\psi_{t}^{\lambda} is the partial realization of the first tt selected items and zλz^{\lambda} is the total number of selected items under λ\lambda. For any t[1,zλ]t\in[1,z^{\lambda}], let etλe^{\lambda}_{t} denote the tt-th selected item under λ\lambda, i.e., etλ=dom(ψtλ)dom(ψt1λ)e^{\lambda}_{t}=\mathrm{dom}(\psi_{t}^{\lambda})\setminus\mathrm{dom}(\psi_{t-1}^{\lambda}). Suppose Pr[λ]\Pr[\lambda] is the probability that λ\lambda occurs for any λ\lambda, we can represent the expected utility of πc\pi^{c} conditioned on σ\sigma as follows:

𝔼[favg(πc)σ]=λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle\mathbb{E}[f_{avg}(\pi^{c})\mid\sigma]=\sum_{\lambda\in U}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (27)
=λU+Pr[λ](t[zλ]Δ(etλψt1λ))I+λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle=\underbrace{\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda}))}_{I}+\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (28)

where U={λPr[λ]>0}U=\{\lambda\mid\Pr[\lambda]>0\} represents all possible stories of running πc\pi^{c}, U+U^{+} represents those stories where πc\pi^{c} selects exactly BB items, i.e., U+={λUzλ=B}U^{+}=\{\lambda\in U\mid z^{\lambda}=B\}, and UU^{-} represents those stories where πc\pi^{c} selects fewer than BB items, i.e., U={λUzλ<B}U^{-}=\{\lambda\in U\mid z^{\lambda}<B\}. Therefore, U=U+UU=U^{+}\cup U^{-}.

We prove this lemma in two cases as follows. We first consider the case when λU+Pr[λ]1/2\sum_{\lambda\in U^{+}}\Pr[\lambda]\geq 1/2 and show that the value of part II of (28)(\ref{eq:1}) is lower bounded by α4favg(π)\frac{\alpha}{4}f_{avg}(\pi^{*}). According to the definition of U+U^{+}, we have zλ=Bz^{\lambda}=B for any λU+\lambda\in U^{+}. Moreover, recall that for all t[zλ]t\in[z^{\lambda}], Δ(etλψt1λ)v2B\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})\geq\frac{v}{2B} due to the design of our algorithm. Therefore, for any λU+\lambda\in U^{+},

t[zλ]Δ(etλψt1λ)v2B×B=v2\displaystyle\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})\geq\frac{v}{2B}\times B=\frac{v}{2} (29)

It follows that

λU+Pr[λ](t[zλ]Δ(etλψt1λ))(λU+Pr[λ])×v2v4α4favg(π)\displaystyle\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda}))\geq(\sum_{\lambda\in U^{+}}\Pr[\lambda])\times\frac{v}{2}\geq\frac{v}{4}\geq\frac{\alpha}{4}f_{avg}(\pi^{*}) (30)

The first inequality is due to (29), the second inequality is due to the assumption that λU+Pr[λ]1/2\sum_{\lambda\in U^{+}}\Pr[\lambda]\geq 1/2, and the third inequality is due to the assumption that vαfavg(π)v\geq\alpha\cdot f_{avg}(\pi^{*}). We conclude that the value of II (and thus 𝔼[favg(πc)σ]\mathbb{E}[f_{avg}(\pi^{c})\mid\sigma]) is no less than α4favg(π)\frac{\alpha}{4}f_{avg}(\pi^{*}), i.e.,

𝔼[favg(πc)σ]Iα4favg(π)\displaystyle\mathbb{E}[f_{avg}(\pi^{c})\mid\sigma]\geq I\geq\frac{\alpha}{4}f_{avg}(\pi^{*}) (31)

We next consider the case when λU+Pr[λ]<1/2\sum_{\lambda\in U^{+}}\Pr[\lambda]<1/2. We show that 𝔼[favg(πc)σ]2β4favg(π)\mathbb{E}[f_{avg}(\pi^{c})\mid\sigma]\geq\frac{2-\beta}{4}f_{avg}(\pi^{*}) for any sequence of arrivals σ\sigma under this case. Observe that

𝔼[favg(πc@π)favg(πc)σ]=λUPr[λ](t[zλ]Δ(πψzλλ))\displaystyle\mathbb{E}[f_{avg}(\pi^{c}@\pi^{*})-f_{avg}(\pi^{c})\mid\sigma]=\sum_{\lambda\in U}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda})) (32)
=λU+Pr[λ](t[zλ]Δ(πψzλλ))II+λUPr[λ](t[zλ]Δ(πψzλλ))III\displaystyle=\underbrace{\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda}))}_{II}+\underbrace{\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda}))}_{III} (33)

Because f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is semi-policywise submodular with respect to p(ϕ)p(\phi) and (c,B)(c,B), we have maxπΩpΔ(π|ψzλλ)maxπΩpfavg(π)\max_{\pi\in\Omega^{p}}\Delta(\pi|\psi_{z^{\lambda}}^{\lambda})\leq\max_{\pi\in\Omega^{p}}f_{avg}(\pi). Moreover, because Δ(πψzλλ)maxπΩpΔ(π|ψzλλ)\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda})\leq\max_{\pi\in\Omega^{p}}\Delta(\pi|\psi_{z^{\lambda}}^{\lambda}) and maxπΩpfavg(π)=favg(π)\max_{\pi\in\Omega^{p}}f_{avg}(\pi)=f_{avg}(\pi^{*}), we have

Δ(πψzλλ)favg(π)\displaystyle\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda})\leq f_{avg}(\pi^{*}) (34)

It follows that

II=λU+Pr[λ](t[zλ]Δ(πψzλλ))(λU+Pr[λ])favg(π)\displaystyle II=\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda}))\leq(\sum_{\lambda\in U^{+}}\Pr[\lambda])f_{avg}(\pi^{*}) (35)

Next, we show that IIIIII is upper bounded by (λUPr[λ])12favg(π)(\sum_{\lambda\in U^{-}}\Pr[\lambda])\frac{1}{2}f_{avg}(\pi^{*}). For any final partial realization ψzλλ\psi_{z^{\lambda}}^{\lambda}, let M(ψzλλ)=argmax|R|=B{eRΔ(eψzλλ)}M(\psi_{z^{\lambda}}^{\lambda})=\arg\max_{|R|=B}\{\sum_{e\in R}\Delta(e\mid\psi_{z^{\lambda}}^{\lambda})\} denote the set of BB items having the largest marginal utility on top of ψzλλ\psi_{z^{\lambda}}^{\lambda}. It has been shown that if f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is adaptive submodular, then for any ψzλλ\psi_{z^{\lambda}}^{\lambda},

Δ(πψzλλ)eM(ψzλλ)Δ(eψzλλ)\displaystyle\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda})\leq\sum_{e\in M(\psi_{z^{\lambda}}^{\lambda})}\Delta(e\mid\psi_{z^{\lambda}}^{\lambda}) (36)

Recall that for every λU\lambda\in U^{-}, we have zλ<Bz^{\lambda}<B, that is, πc\pi^{c} selects fewer than BB items under λ\lambda. This, together with the design of πc\pi^{c}, indicates that for any eEe\in E, its marginal utility of ee on top of ψzλλ\psi_{z^{\lambda}}^{\lambda} is less than v2B\frac{v}{2B}, i.e., Δ(eψzλλ)<v2B\Delta(e\mid\psi_{z^{\lambda}}^{\lambda})<\frac{v}{2B}. Therefore,

eM(ψzλλ)Δ(eψzλλ)B×v2B=v2\displaystyle\sum_{e\in M(\psi_{z^{\lambda}}^{\lambda})}\Delta(e\mid\psi_{z^{\lambda}}^{\lambda})\leq B\times\frac{v}{2B}=\frac{v}{2} (37)

(36) and (61) imply that

Δ(πψzλλ)v2\displaystyle\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda})\leq\frac{v}{2} (38)

We next provide an upper bound of IIIIII,

λUPr[λ](t[zλ]Δ(πψzλλ))(λUPr[λ])v2(λUPr[λ])β2favg(π)\displaystyle\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(\pi^{*}\mid\psi_{z^{\lambda}}^{\lambda}))\leq(\sum_{\lambda\in U^{-}}\Pr[\lambda])\frac{v}{2}\leq(\sum_{\lambda\in U^{-}}\Pr[\lambda])\frac{\beta}{2}f_{avg}(\pi^{*}) (39)

where the first inequality is due to (38) and the second inequality is due to vβfavg(π)v\leq\beta f_{avg}(\pi^{*}).

Now we are in position to bound the value of 𝔼[favg(πc@π)favg(πc)σ]\mathbb{E}[f_{avg}(\pi^{c}@\pi^{*})-f_{avg}(\pi^{c})\mid\sigma].

𝔼[favg(πc@π)favg(πc)σ]=II+III\displaystyle\mathbb{E}[f_{avg}(\pi^{c}@\pi^{*})-f_{avg}(\pi^{c})\mid\sigma]=II+III (40)
\displaystyle\leq (λU+Pr[λ])favg(π)+(λUPr[λ])β2favg(π)\displaystyle(\sum_{\lambda\in U^{+}}\Pr[\lambda])f_{avg}(\pi^{*})+(\sum_{\lambda\in U^{-}}\Pr[\lambda])\frac{\beta}{2}f_{avg}(\pi^{*}) (41)
\displaystyle\leq 12favg(π)+12×β2favg(π)\displaystyle\frac{1}{2}f_{avg}(\pi^{*})+\frac{1}{2}\times\frac{\beta}{2}f_{avg}(\pi^{*}) (42)
=\displaystyle= 2+β4favg(π)\displaystyle\frac{2+\beta}{4}f_{avg}(\pi^{*}) (43)

The first inequality is due to (35) and (39). The second inequality is due to λU+Pr[λ]+λUPr[λ]=1\sum_{\lambda\in U^{+}}\Pr[\lambda]+\sum_{\lambda\in U^{-}}\Pr[\lambda]=1 and the assumptions that λU+Pr[λ]<1/2\sum_{\lambda\in U^{+}}\Pr[\lambda]<1/2 and β[1,2]\beta\in[1,2]. Because 𝔼[favg(πk+@π)σ]𝔼[favg(π)σ]\mathbb{E}[f_{avg}(\pi^{k+}@\pi^{*})\mid\sigma]\geq\mathbb{E}[f_{avg}(\pi^{*})\mid\sigma], which is due to f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is adaptive monotone, we have

𝔼[favg(π)favg(πc)σ]\displaystyle\mathbb{E}[f_{avg}(\pi^{*})-f_{avg}(\pi^{c})\mid\sigma] \displaystyle\leq 𝔼[favg(πc@π)favg(πc)σ]\displaystyle\mathbb{E}[f_{avg}(\pi^{c}@\pi^{*})-f_{avg}(\pi^{c})\mid\sigma] (44)
\displaystyle\leq 2+β4favg(π)\displaystyle\frac{2+\beta}{4}f_{avg}(\pi^{*}) (45)

where the second inequality is due to (43). This, together with the fact that 𝔼[favg(π)σ]=favg(π)\mathbb{E}[f_{avg}(\pi^{*})\mid\sigma]=f_{avg}(\pi^{*}), i.e., the optimal pool-based policy is not dependent on the sequence of arrivals, implies that

𝔼[favg(πc)σ]2β4favg(π)\displaystyle\mathbb{E}[f_{avg}(\pi^{c})\mid\sigma]\geq\frac{2-\beta}{4}f_{avg}(\pi^{*}) (46)

Combining the above two cases ((31) and (46)), we have

𝔼[favg(πc)σ]min{α4,2β4}favg(π)\displaystyle\mathbb{E}[f_{avg}(\pi^{c})\mid\sigma]\geq\min\{\frac{\alpha}{4},\frac{2-\beta}{4}\}f_{avg}(\pi^{*}) (47)

6.2 Proof of Proposition 1

Let λ={ψ1λ,ψ2λ,ψ3λ,,ψzλλ}\lambda=\{\psi_{1}^{\lambda},\psi_{2}^{\lambda},\psi_{3}^{\lambda},\cdots,\psi_{z^{\lambda}}^{\lambda}\} denote a fixed run of πk+\pi^{k+}, where ψtλ\psi_{t}^{\lambda} is the partial realization of the first tt selected items and zλz^{\lambda} is the total number of selected items under λ\lambda. Let U={λPr[λ]>0}U=\{\lambda\mid\Pr[\lambda]>0\} represent all possible stories of running πk+\pi^{k+}, U+U^{+} represent those stories where πk+\pi^{k+} meets or violates the budget, i.e., U+={λUc(dom(ψzλλ))B}U^{+}=\{\lambda\in U\mid c(\mathrm{dom}(\psi_{z^{\lambda}}^{\lambda}))\geq B\}, and UU^{-} represent those stories where πk+\pi^{k+} does not use up the budget, i.e., U={λUc(dom(ψzλλ))<B}U^{-}=\{\lambda\in U\mid c(\mathrm{dom}(\psi_{z^{\lambda}}^{\lambda}))<B\}. Therefore, U=U+UU=U^{+}\cup U^{-}. Using the above notations, we can represent 𝔼[favg(πk+)σ]\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma] as follows:

𝔼[favg(πk+)σ]=λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma]=\sum_{\lambda\in U}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (48)
=λU+Pr[λ](t[zλ]Δ(etλψt1λ))+λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle=\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda}))+\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (49)

Note that the outputs of πk+\pi^{k+} and πk\pi^{k} differ in at most one item. This occurs only when πk+\pi^{k+} selects some item that violates the budget constraint. Hence, by removing the last selected item form the output of πk+1\pi^{k+1} under all λU+\lambda\in U^{+}, we obtain a lower bound on the expected utility of πk\pi^{k}, using the same notations as for analyzing 𝔼[favg(πk+)σ]\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma], as follows:

𝔼[favg(πk)σ]\displaystyle\mathbb{E}[f_{avg}(\pi^{k})\mid\sigma]\geq (50)
λU+Pr[λ](t[zλ1]Δ(etλψt1λ))+λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}-1]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda}))+\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (51)

Hence,

𝔼[favg(πk)σ]+f(e)\displaystyle\mathbb{E}[f_{avg}(\pi^{k})\mid\sigma]+f(e^{*}) (52)
λU+Pr[λ](t[zλ1]Δ(etλψt1λ))\displaystyle\geq\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}-1]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (53)
+λUPr[λ](t[zλ]Δ(etλψt1λ))+f(e)\displaystyle\quad\quad\quad+\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda}))+f(e^{*}) (54)
λU+Pr[λ](t[zλ1]Δ(etλψt1λ)+f(e))\displaystyle\geq\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}-1]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})+f(e^{*})) (55)
+λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle\quad\quad\quad+\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (56)
λU+Pr[λ](t[zλ1]Δ(etλψt1λ)+Δ(ezλλψzλ1λ))\displaystyle\geq\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}-1]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})+\Delta(e^{\lambda}_{z^{\lambda}}\mid\psi_{z^{\lambda}-1}^{\lambda})) (57)
+λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle\quad\quad\quad+\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (58)
=λU+Pr[λ](t[zλ]Δ(etλψt1λ))+λUPr[λ](t[zλ]Δ(etλψt1λ))\displaystyle=\sum_{\lambda\in U^{+}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda}))+\sum_{\lambda\in U^{-}}\Pr[\lambda](\sum_{t\in[z^{\lambda}]}\Delta(e^{\lambda}_{t}\mid\psi_{t-1}^{\lambda})) (59)
=𝔼[favg(πk+)σ]\displaystyle=\mathbb{E}[f_{avg}(\pi^{k+})\mid\sigma] (60)

The third inequality is due to the assumption that f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is adaptive submodular, which implies that f(e)Δ(ezλλψzλ1λ)f(e^{*})\geq\Delta(e^{\lambda}_{z^{\lambda}}\mid\psi_{z^{\lambda}-1}^{\lambda}) for any λU\lambda\in U.

6.3 Applications

In this section, we show that both adaptive submodularity and semi-policywise submodularity can be found in several important applications. We first present the concept of policywise submodularity which is first introduced in [19].

Definition 6

[19] A function f:2E×O0f:2^{E\times O}\rightarrow\mathbb{R}_{\geq 0} is policywise submodular with respect to a prior p(ϕ)p(\phi) and a knapsack constraint (c,B)(c,B) if for any two partial realizations ψ\psi and ψ\psi^{\prime} such that ψψ\psi^{\prime}\subseteq\psi and c(dom(ψ))Bc(\mathrm{dom}(\psi))\leq B, and any SES\subseteq E such that Sdom(ψ)=S\cap\mathrm{dom}(\psi)=\emptyset, we have maxπΩΔ(πψ)maxπΩΔ(πψ)\max_{\pi\in\Omega}\Delta(\pi\mid\psi^{\prime})\geq\max_{\pi\in\Omega}\Delta(\pi\mid\psi), where Ω={πϕ:c(E(π,ϕ))Bc(dom(ψ)),E(π,ϕ)S}\Omega=\{\pi\mid\forall\phi:c(E(\pi,\phi))\leq B-c(\mathrm{dom}(\psi)),E(\pi,\phi)\subseteq S\} denotes the set of feasible policies which are restricted to selecting items only from SS.

In [19], it has been shown that many existing adaptive submodular functions used in various applications, including pool-based active learning [6, 7, 8, 4], stochastic submodular cover [1] and adaptive viral marketing [6], also satisfy the policywise submodularity. Our next lemma shows that policywise submodularity implies semi-policywise submodularity. This indicates that all aforementioned applications satisfy both adaptive submodularity and semi-policywise submodularity.

Lemma 2

If f:2E×OE0f:2^{E}\times O^{E}\rightarrow\mathbb{R}_{\geq 0} is policywise submodular and adaptive monotone with respect to p(ϕ)p(\phi) and all knapsack constraints, then f:2E×OE0f:2^{E}\times O^{E}\rightarrow\mathbb{R}_{\geq 0} is semi-policywise submodular with respect to p(ϕ)p(\phi) and any knapsack constraint (c,B)(c,B).

Proof: Consider two partial realizations \emptyset and ψ\psi, and any knapsack constraint (c,B)(c,B), let B=B+c(dom(ψ))B^{\prime}=B+c(\mathrm{dom}(\psi)) and S=Edom(ψ)S=E\setminus\mathrm{dom}(\psi). Because ψ\emptyset\subseteq\psi and we assume f:2E×OE0f:2^{E}\times O^{E}\rightarrow\mathbb{R}_{\geq 0} is policywise submodular with respect to p(ϕ)p(\phi) and all knapsack constraints, including (c,B)(c,B^{\prime}), we have

maxπΩΔ(π)maxπΩΔ(πψ)\displaystyle\max_{\pi\in\Omega}\Delta(\pi\mid\emptyset)\geq\max_{\pi\in\Omega}\Delta(\pi\mid\psi) (61)

where Ω={πϕ:c(E(π,ϕ))B,E(π,ϕ)S}\Omega=\{\pi\mid\forall\phi:c(E(\pi,\phi))\leq B,E(\pi,\phi)\subseteq S\}. Let π=argmaxπΩpΔ(πψ)\pi^{\prime}=\operatorname*{arg\,max}_{\pi\in\Omega^{p}}\Delta(\pi\mid\psi) represent an optimal pool-based policy subject to a knapsack constraint (c,B)(c,B) on top of ψ\psi. Due to the definition of BB^{\prime}, it is easy to verify that maxπΩΔ(πψ)=Δ(πψ)\max_{\pi\in\Omega}\Delta(\pi\mid\psi)=\Delta(\pi^{\prime}\mid\psi). Hence, (61) indicates that

maxπΩΔ(π)Δ(πψ)\displaystyle\max_{\pi\in\Omega}\Delta(\pi\mid\emptyset)\geq\Delta(\pi^{\prime}\mid\psi) (62)

Moreover, because π\pi^{*} represents the best pool-based policy subject to (c,B)(c,B), we have favg(π)favg(π)Δ(π)f_{avg}(\pi^{*})\geq f_{avg}(\pi^{\prime})\geq\Delta(\pi^{\prime}\mid\emptyset) where the second inequality is due to f:2E×OE0f:2^{E}\times O^{E}\rightarrow\mathbb{R}_{\geq 0} is adaptive submodular. This, together with (62), implies that favg(π)Δ(πψ)f_{avg}(\pi^{*})\geq\Delta(\pi^{\prime}\mid\psi). Hence, f:2E×OE0f:2^{E}\times O^{E}\rightarrow\mathbb{R}_{\geq 0} is semi-policywise submodular with respect to p(ϕ)p(\phi) and (c,B)(c,B). \Box