This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time

Kai Han  Zongmai Cao  Shuang Cui  Benwei Wu
School of Computer Science and Technology / Suzhou Research Institute
University of Science and Technology of China
hankai@ustc.edu.cn, {czm18,lakers,wubenwei}@mail.ustc.edu.cn

We study the problem of maximizing a non-monotone, non-negative submodular function subject to a matroid constraint. The prior best-known deterministic approximation ratio for this problem is 14ϵ\frac{1}{4}-\epsilon under 𝒪((n4/ϵ)logn)\mathcal{O}(({n^{4}}/{\epsilon})\log n) time complexity. We show that this deterministic ratio can be improved to 14\frac{1}{4} under 𝒪(nr)\mathcal{O}(nr) time complexity, and then present a more practical algorithm dubbed TwinGreedyFast which achieves 14ϵ\frac{1}{4}-\epsilon deterministic ratio in nearly-linear running time of 𝒪(nϵlogrϵ)\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon}). Our approach is based on a novel algorithmic framework of simultaneously constructing two candidate solution sets through greedy search, which enables us to get improved performance bounds by fully exploiting the properties of independence systems. As a byproduct of this framework, we also show that TwinGreedyFast achieves 12p+2ϵ\frac{1}{2p+2}-\epsilon deterministic ratio under a pp-set system constraint with the same time complexity. To showcase the practicality of our approach, we empirically evaluated the performance of TwinGreedyFast on two network applications, and observed that it outperforms the state-of-the-art deterministic and randomized algorithms with efficient implementations for our problem.

1 Introduction

Submodular function maximization has aroused great interests from both academic and industrial societies due to its wide applications such as crowdsourcing [47], information gathering [35], sensor placement [33], influence maximization [37, 48] and exemplar-based clustering [30]. Due to the large volume of data and heterogeneous application scenarios in practice, there is a growing demand for designing accurate and efficient submodular maximization algorithms subject to various constraints.

Matroid is an important structure in combinatorial optimization that abstracts and generalizes the notion of linear independence in vector spaces [8]. The problem of submodular maximization subject to a matroid constraint (SMM) has attracted considerable attention since the 1970s. When the considered submodular function f()f(\cdot) is monotone, the classical work of Fisher et al. [27] presents a deterministic approximation ratio of 1/21/2, which keeps as the best deterministic ratio during decades until Buchbinder et al. [14] improve this ratio to 1/2+ϵ1/2+\epsilon very recently.

When the submodular function f()f(\cdot) is non-monotone, the best-known deterministic ratio for the SMM problem is 1/4ϵ{1}/{4}-\epsilon, proposed by Lee et al. [38], but with a high time complexity of 𝒪((n4logn)/ϵ)\mathcal{O}((n^{4}\log n)/\epsilon). Recently, there appear studies aiming at designing more efficient and practical algorithms for this problem. In this line of studies, the elegant work of Mirzasoleiman et al. [44] and Feldman et al. [25] proposes the best deterministic ratio of 1/6ϵ1/6-\epsilon and the fastest implementation with an expected ratio of 1/41/4, and their algorithms can also handle more general constraints such as a pp-set system constraint. For clarity, we list the performance bounds of these work in Table 1. However, it is still unclear whether the 1/4ϵ{1}/{4}-\epsilon deterministic ratio for a single matroid constraint can be further improved, or whether there exist faster algorithms achieving the same 1/4ϵ{1}/{4}-\epsilon deterministic ratio.

In this paper, we propose an approximation algorithm TwinGreedy (Alg. 1) with a deterministic 1/4{1}/{4} ratio and 𝒪(nr)\mathcal{O}(nr) running time for maximizing a non-monotone, non-negative submodular function subject to a matroid constraint, thus improving the best-known 1/4ϵ1/4-\epsilon deterministic ratio of Lee et al. [38]. Furthermore, we show that the solution framework of TwinGreedy can be implemented in a more efficient way, and present a new algorithm dubbed TwinGreedyFast with 1/4ϵ1/4-\epsilon deterministic ratio and nearly-linear 𝒪(nϵlogrϵ)\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon}) running time. To the best of our knowledge, TwinGreedyFast is the fastest algorithm achieving the 14ϵ\frac{1}{4}-\epsilon deterministic ratio for our problem in the literature. As a byproduct, we also show that TwinGreedyFast can be used to address a more general pp-set system constraint and achieves a 12p+2ϵ\frac{1}{2p+2}-\epsilon approximation ratio with the same time complexity.

It is noted that most of the current deterministic algorithms for non-monontone submodular maximization (e.g., [44, 25, 45]) leverage the “repeated greedy-search” framework proposed by Gupta et al. [31], where two or more candidate solution sets are constructed successively and then an unconstrained submodular maximization (USM) algorithm (e.g., [12]) is called to find a good solution among the candidate sets and their subsets. Our approach is based on a novel “simultaneous greedy-search” framework different from theirs, where two disjoint candidate solution sets S1S_{1} and S2S_{2} are built simultaneously with only single-pass greedy searching, without calling a USM algorithm. We call these two solution sets S1S_{1} and S2S_{2} as “twin sets” because they “grow up” simultaneously. Thanks to this framework, we are able to bound the “utility loss” caused by greedy searching using S1S_{1} and S2S_{2} themselves, through a careful classification of the elements in an optimal solution OO and mapping them to the elements in S1S2S_{1}\cup S_{2}. Furthermore, by incorporating a thresholding method inspired by Badanidiyuru and Vondrák [2] into our framework, the TwinGreedyFast algorithm achieves nearly-linear time complexity by only accepting elements whose marginal gains are no smaller than given thresholds.

We evaluate the performance of TwinGreedyFast on two applications: social network monitoring and multi-product viral marketing. The experimental results show that TwinGreedyFast runs more than an order of magnitude faster than the state-of-the-art efficient algorithms for our problem, and also achieves better solution quality than the currently fastest randomized algorithms in the literature.

Table 1: Approximation for Non-monotone Submodular Maximization over a Matroid
Algorithms Ratio Time Complexity Type
Lee et al. [38] 1/4ϵ1/4-\epsilon 𝒪((n4logn)/ϵ)\mathcal{O}((n^{4}\log n)/\epsilon) Deterministic
Mirzasoleiman et al. [44] 1/6ϵ1/6-\epsilon 𝒪(nr+r/ϵ)\mathcal{O}(nr+r/\epsilon) Deterministic
Feldman et al. [25] 1/41/4 𝒪(nr)\mathcal{O}(nr) Randomized
Buchbinder and Feldman [10] 0.3850.385 poly(n)\mathrm{poly}(n) Randomized
TwinGreedy (Alg. 1) 1/41/4 𝒪(nr)\mathcal{O}(nr) Deterministic
TwinGreedyFast (Alg. 2) 1/4ϵ1/4-\epsilon 𝒪((n/ϵ)log(r/ϵ))\mathcal{O}((n/\epsilon)\log(r/\epsilon)) Deterministic

1.1 Related Work

When the considered submodular function f()f(\cdot) is monotone, Calinescu et al. [15] propose an optimal 11/e1-1/e expected ratio for the problem of submodular maximization subject to a matroid constraint (SMM). The SMM problem seems to be harder when f()f(\cdot) is non-monotone, and the current best-known expected ratio is 0.385 [10], got after a series of studies [50, 29, 24, 38]. However, all these approaches are based on tools with high time complexity such as multilinear extension.

There also exist efficient deterministic algorithms for the SMM problem: Gupta et al. [31] are the first to apply the “repeated greedy search” framework described in last section and achieve 1/12ϵ1/12-\epsilon ratio, which is improved to 1/6ϵ1/6-\epsilon by Mirzasoleiman et al. [44] and Feldman et al. [25]; under a more general pp-set system constraint, Mirzasoleiman et al. [44] achieve p(p+1)(2p+1)ϵ\frac{p}{(p+1)(2p+1)}-\epsilon deterministic ratio and Feldman et al. [25] achieve 1p+2p+3ϵ\frac{1}{p+2\sqrt{p}+3}-\epsilon deterministic ratio (assuming that they use the USM algorithm with 1/2ϵ1/2-\epsilon deterministic ratio in [9]); some studies also propose streaming algorithms under various constraints [45, 32].

As regards the efficient randomized algorithms for the SMM problem, the SampleGreedy algorithm in [25] achieves 1/41/4 expected ratio with 𝒪(nr)\mathcal{O}(nr) running time; the algorithms in [11] also achieve a 1/41/4 expected ratio with slightly worse time complexity of 𝒪(nrlogn)\mathcal{O}(nr\log n) and a 0.2830.283 expected ratio under cubic time complexity of 𝒪(nrlogn+r3+ϵ)\mathcal{O}(nr\log n+r^{3+\epsilon})111Although the randomized 0.283-approximation algorithm in [11] has a better approximation ratio than TwinGreedy, its time complexity is larger than that of TwinGreedy by at least an additive factor of 𝒪(r3)\mathcal{O}(r^{3}), which can be large as rr can be in the order of Θ(n)\Theta(n).; Chekuri and Quanrud [16] provide a 0.172ϵ0.172-\epsilon expected ratio under 𝒪(lognlogr/ϵ2)\mathcal{O}(\log n\log r/\epsilon^{2}) adaptive rounds; and Feldman et al. [26] propose a 1/(3+22)1/(3+2\sqrt{2}) expected ratio under the steaming setting. It is also noted that Buchbinder et al. [14] provide a de-randomized version of the algorithm in [11] for monotone submodular maximization, which has time complexity of 𝒪(nr2)\mathcal{O}(nr^{2}). However, it remains an open problem to find the approximation ratio of this de-randomized algorithm for the SMM problem with a non-monotone objective function.

A lot of elegant studies provide efficient submodular optimization algorithms for monotone submodular functions or for a cardinality constraint [2, 13, 4, 43, 5, 36, 23, 22]. However, these studies have not addressed the problem of non-monotone submodular maximization subject to a general matroid (or pp-set system) constraint, and our main techniques are essentially different from theirs.

2 Preliminaries

Given a ground set 𝒩\mathcal{N} with |𝒩|=n|\mathcal{N}|=n, a function f:2𝒩f:2^{\mathcal{N}}\mapsto\mathbb{R} is submodular if for all X,Y𝒩X,Y\subseteq\mathcal{N}, f(X)+f(Y)f(XY)+f(XY)f(X)+f(Y)\geq f(X\cup Y)+f(X\cap Y). The function f()f(\cdot) is called non-negative if f(X)0f(X)\geq 0 for all X𝒩X\subseteq\mathcal{N}, and f()f(\cdot) is called non-monotone if XY𝒩:f(X)>f(Y)\exists X\subset Y\subseteq\mathcal{N}:f(X)>f(Y). For brevity, we use f(XY)f(X\mid Y) to denote f(XY)f(Y)f(X\cup Y)-f(Y) for any X,Y𝒩X,Y\subseteq\mathcal{N}, and write f({x}Y)f(\{x\}\mid Y) as f(xY)f(x\mid Y) for any x𝒩x\in\mathcal{N}. We call f(XY)f(X\mid Y) as the “marginal gain” of XX with respect to YY.

An independence system (𝒩,)(\mathcal{N},\mathcal{I}) consists of a finite ground set 𝒩\mathcal{N} and a family of independent sets 2𝒩\mathcal{I}\subseteq 2^{\mathcal{N}} satisfying: (1) \emptyset\in\mathcal{I}; (2) If ABA\subseteq B\in\mathcal{I}, then AA\in\mathcal{I} (called hereditary property). An independence system (𝒩,)(\mathcal{N},\mathcal{I}) is called a matroid if it satisfies: for any A,BA\in\mathcal{I},B\in\mathcal{I} and |A|<|B||A|<|B|, there exists xB\Ax\in B\backslash A such that A{x}A\cup\{x\}\in\mathcal{I} (called exchange property).

Given an independence system (𝒩,)(\mathcal{N},\mathcal{I}) and any XY𝒩X\subseteq Y\subseteq\mathcal{N}, XX is called a base of YY if: (1) XX\in\mathcal{I}; (2) xY\X:X{x}\forall x\in Y\backslash X:X\cup\{x\}\notin\mathcal{I}. The independence system (𝒩,)(\mathcal{N},\mathcal{I}) is called a pp-set system if for every Y𝒩Y\subseteq\mathcal{N} and any two basis X1,X2X_{1},X_{2} of YY, we have |X1|p|X2||X_{1}|\leq p|X_{2}| (p1p\geq 1). It is known that pp-set system is a generalization of several structures on independence systems including matroid, pp-matchoid and pp-extendible system, and an inclusion hierarchy of these structures can be found in [32].

In this paper, we consider a non-monotone, non-negative submodular function f()f(\cdot). Given f()f(\cdot) and a matroid (𝒩,)(\mathcal{N},\mathcal{I}), our optimization problem is max{f(S):S}\max\{f(S):S\in\mathcal{I}\}. In the end of Section 4, we will also consider a more general case where (𝒩,)(\mathcal{N},\mathcal{I}) is a pp-set system.

We introduce some frequently used properties of submodular functions. For any XY𝒩X\subseteq Y\subseteq\mathcal{N} and any Z𝒩\YZ\subseteq\mathcal{N}\backslash Y, we have f(ZY)f(ZX)f(Z\mid Y)\leq f(Z\mid X); this property can be derived by the definition of submodular functions. For any X,Y𝒩X,Y\subseteq\mathcal{N} and a partition Z1,Z2,,ZtZ_{1},Z_{2},\cdots,Z_{t} of Y\XY\backslash X, we have

f(YX)=j=1tf(ZjZ1Zj1X)j=1tf(ZjX)\displaystyle f(Y\mid X)=\sum\nolimits_{j=1}^{t}f(Z_{j}\mid Z_{1}\cup\cdots\cup Z_{j-1}\cup X)\leq\sum\nolimits_{j=1}^{t}f(Z_{j}\mid X) (1)

For convenience, we use [h][h] to denote {1,,h}\{1,\cdots,h\} for any positive integer hh, and use rr to denote the rank of (𝒩,)(\mathcal{N},\mathcal{I}), i.e., r=max{|S|:S}r=\max\{|S|:S\in\mathcal{I}\}, and denote an optimal solution to our problem by OO.

3 The TwinGreedy Algorithm

In this section, we consider a matroid constraint and introduce the TwinGreedy algorithm (Alg. 1) that achieves 1/41/4 approximation ratio. The TwinGreedy algorithm maintains two solution sets S1S_{1} and S2S_{2} that are initialized to empty sets. At each iteration, it considers all the candidate elements in 𝒩\(S1S2)\mathcal{N}\backslash(S_{1}\cup S_{2}) that can be added into S1S_{1} or S2S_{2} without violating the feasibility of \mathcal{I}. If there exists such an element, then it greedily selects (e,Si)(e,S_{i}) where i{1,2}i\in\{1,2\} such that adding ee into SiS_{i} can bring the maximal marginal gain f(eSi)f(e\mid S_{i}) without violating the feasibility of \mathcal{I}. TwinGreedy terminates when no more elements can be added into S1S_{1} or S2S_{2} with a positive marginal gain while still keeping the feasibility of \mathcal{I}. Then it returns the one between S1S_{1} and S2S_{2} with the larger objective function value.

1 S1;S2S_{1}\leftarrow\emptyset;S_{2}\leftarrow\emptyset;
2 repeat
3       1{e𝒩\(S1S2):S1{e}}\mathcal{M}_{1}\leftarrow\{e\in\mathcal{N}\backslash(S_{1}\cup S_{2}):S_{1}\cup\{e\}\in\mathcal{I}\}
4       2{e𝒩\(S1S2):S2{e}}\mathcal{M}_{2}\leftarrow\{e\in\mathcal{N}\backslash(S_{1}\cup S_{2}):S_{2}\cup\{e\}\in\mathcal{I}\}
5       C{jj{1,2}j}C\leftarrow\{j\mid j\in\{1,2\}\wedge\mathcal{M}_{j}\neq\emptyset\}
6       if CC\neq\emptyset then
7             (i,e)argmaxjC,ujf(uSj)(i,e)\leftarrow\arg\max_{j\in C,u\in\mathcal{M}_{j}}{f(u\mid S_{j})}; (ties broken arbitrarily)
8             if f(eSi)0f(e\mid S_{i})\leq 0 then Break;
9             SiSi{e}S_{i}\leftarrow S_{i}\cup\{e\};
10      
11until 12=\mathcal{M}_{1}\cup\mathcal{M}_{2}=\emptyset;
12SargmaxX{S1,S2}f(X)S^{*}\leftarrow\arg\max_{X\in\{S_{1},S_{2}\}}f(X)
13
return SS^{*}
Algorithm 1 𝖳𝗐𝗂𝗇𝖦𝗋𝖾𝖾𝖽𝗒(𝒩,,f())\mathsf{TwinGreedy}(\mathcal{N},\mathcal{I},f(\cdot))

Although the TwinGreedy algorithm is simple, its performance analysis is highly non-trivial. Roughly speaking, as TwinGreedy adopts a greedy strategy to select elements, we try to find some “competitive relationships” between the elements in OO and those in S1S2S_{1}\cup S_{2}, such that the total marginal gains of the elements in OO with respect to S1S_{1} and S2S_{2} can be upper-bounded by S1S_{1} and S2S_{2}’s own objective function values. However, this is non-trivial due to the correlation between the elements in S1S_{1} and S2S_{2}. To overcome this hurdle, we first classify the elements in OO, as shown in Definition 1:

Definition 1

Consider the two solution sets S1S_{1} and S2S_{2} when TwinGreedy returns. We can write S1S2S_{1}\cup S_{2} as {v1,v2,,vk}\{v_{1},v_{2},\cdots,v_{k}\} where k=|S1S2|k=|S_{1}\cup S_{2}|, such that vtv_{t} is added into S1S2S_{1}\cup S_{2} by the algorithm before vsv_{s} for any 1t<sk1\leq t<s\leq k. With this ordered list, given any e=vjS1S2e=v_{j}\in S_{1}\cup S_{2}, we define

Pre(e,S1)={v1,,vj1}S1;Pre(e,S2)={v1,,vj1}S2;\displaystyle\mathrm{Pre}(e,S_{1})=\{v_{1},\cdots,v_{j-1}\}\cap S_{1};~{}~{}~{}\mathrm{Pre}(e,S_{2})=\{v_{1},\cdots,v_{j-1}\}\cap S_{2}; (2)

That is, Pre(e,Si)\mathrm{Pre}(e,S_{i}) denotes the set of elements in SiS_{i} (i{1,2}i\in\{1,2\}) that are added by the TwinGreedy algorithm before adding ee. Furthermore, we define

O1+={eOS1:Pre(e,S2){e}};\displaystyle O_{1}^{+}=\{e\in O\cap S_{1}:\mathrm{Pre}(e,S_{2})\cup\{e\}\in\mathcal{I}\}; O1={eOS1:Pre(e,S2){e}}\displaystyle O_{1}^{-}=\{e\in O\cap S_{1}:\mathrm{Pre}(e,S_{2})\cup\{e\}\notin\mathcal{I}\}
O2+={eOS2:Pre(e,S1){e}};\displaystyle O_{2}^{+}=\{e\in O\cap S_{2}:\mathrm{Pre}(e,S_{1})\cup\{e\}\in\mathcal{I}\}; O2={eOS2:Pre(e,S1){e}}\displaystyle O_{2}^{-}=\{e\in O\cap S_{2}:\mathrm{Pre}(e,S_{1})\cup\{e\}\notin\mathcal{I}\}
O3={eO\(S1S2):S1{e}};\displaystyle O_{3}=\{e\in O\backslash(S_{1}\cup S_{2}):S_{1}\cup\{e\}\notin\mathcal{I}\};~{}~{}~{} O4={eO\(S1S2):S2{e}}\displaystyle O_{4}=\{e\in O\backslash(S_{1}\cup S_{2}):S_{2}\cup\{e\}\notin\mathcal{I}\}

We also define the marginal gain of any eS1S2e\in S_{1}\cup S_{2} as δ(e)=f(ePre(e,S1))𝟏S1(e)+f(ePre(e,S2))𝟏S2(e)\delta(e)=f(e\mid\mathrm{Pre}(e,S_{1}))\cdot\mathbf{1}_{S_{1}}(e)+f(e\mid\mathrm{Pre}(e,S_{2}))\cdot\mathbf{1}_{S_{2}}(e), where 𝟏Si(e)=1\mathbf{1}_{S_{i}}(e)=1 if eSie\in S_{i} and 𝟏Si(e)=0\mathbf{1}_{S_{i}}(e)=0 otherwise (i{1,2}\forall i\in\{1,2\}).

Intuitively, each element eO1+S1e\in O_{1}^{+}\subseteq S_{1} can also be added into S2S_{2} without violating the feasibility of \mathcal{I} when ee is added into S1S_{1}, while the elements in O1S1O_{1}^{-}\subseteq S_{1} do not have this nice property. The sets O2+O_{2}^{+} and O2O_{2}^{-} can be understood similarly. With the above classification of the elements in OO, we further consider two groups of elements: O1+O1O2O3O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3} and O1O2+O2O4O_{1}^{-}\cup O_{2}^{+}\cup O_{2}^{-}\cup O_{4}. By leveraging the properties of independence systems, we can map the first group to S1S_{1} and the second group to S2S_{2}, as shown by Lemma 1 (the proof can be found in the supplementary file). Intuitively, Lemma 1 holds due to the exchange property of matroids.

Lemma 1

There exists an injective function π1:O1+O1O2O3S1\pi_{1}:O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}\mapsto S_{1} such that:

  1. 1.

    For any eO1+O1O2O3e\in O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}, we have Pre(π1(e),S1){e}\mathrm{Pre}(\pi_{1}(e),S_{1})\cup\{e\}\in\mathcal{I}.

  2. 2.

    For each eO1+O1e\in O_{1}^{+}\cup O_{1}^{-}, we have π1(e)=e\pi_{1}(e)=e.

Similarly, there exists an injective function π2:O1O2+O2O4S2\pi_{2}:O_{1}^{-}\cup O_{2}^{+}\cup O_{2}^{-}\cup O_{4}\mapsto S_{2} such that Pre(π2(e),S2){e}\mathrm{Pre}(\pi_{2}(e),S_{2})\cup\{e\}\in\mathcal{I} for each eO1O2+O2O4e\in O_{1}^{-}\cup O_{2}^{+}\cup O_{2}^{-}\cup O_{4} and π2(e)=e\pi_{2}(e)=e for each eO2+O2e\in O_{2}^{+}\cup O_{2}^{-}.

The first property shown in Lemma 1 implies that, at the moment that π1(e)\pi_{1}(e) is added into S1S_{1}, ee can also be added into S1S_{1} without violating the feasibility of \mathcal{I} (for any eO1+O1O2O3e\in O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}). This makes it possible to compare the marginal gain of ee with respect to S1S_{1} with that of π1(e)\pi_{1}(e). The construction of π2()\pi_{2}(\cdot) is also based on this intuition. With the two injections π1()\pi_{1}(\cdot) and π2()\pi_{2}(\cdot), we can bound the marginal gains of O1+O_{1}^{+} to O4O_{4} with respect to S1S_{1} and S2S_{2}, as shown by the Lemma 2. Lemma 2 can be proved by using Definition 1, Lemma 1 and the submodularity of f()f(\cdot).

Refer to caption
Figure 1: Illustration on the mappings constructed for performance analysis, where the hollow arrows denote π1()\pi_{1}(\cdot) and the solid arrows denote π2()\pi_{2}(\cdot).
Lemma 2

The TwinGreedy algorithm satisfies:

f(O1+S2)eO1+δ(π1(e));\displaystyle f(O_{1}^{+}\mid S_{2})\leq\sum\nolimits_{e\in O_{1}^{+}}\delta(\pi_{1}(e)); f(O2+S1)eO2+δ(π2(e))\displaystyle f(O_{2}^{+}\mid S_{1})\leq\sum\nolimits_{e\in O_{2}^{+}}\delta(\pi_{2}(e)) (3)
f(O1S2)eO1δ(π2(e));\displaystyle f(O_{1}^{-}\mid S_{2})\leq\sum\nolimits_{e\in O_{1}^{-}}\delta(\pi_{2}(e)); f(O2S1)eO2δ(π1(e))\displaystyle f(O_{2}^{-}\mid S_{1})\leq\sum\nolimits_{e\in O_{2}^{-}}\delta(\pi_{1}(e)) (4)
f(O4S2)eO4δ(π2(e));\displaystyle f(O_{4}\mid S_{2})\leq\sum\nolimits_{e\in O_{4}}\delta(\pi_{2}(e)); f(O3S1)eO3δ(π1(e))\displaystyle f(O_{3}\mid S_{1})\leq\sum\nolimits_{e\in O_{3}}\delta(\pi_{1}(e)) (5)

where π1()\pi_{1}(\cdot) and π2()\pi_{2}(\cdot) are the two functions defined in Lemma 1.

The proof of Lemma 2 is deferred to the supplementary file. In the next section, we will provide a proof sketch for a similar lemma (i.e., Lemma 3), which can also be used to understand Lemma 2. Now we can prove the performance bounds of TwinGreedy:

Theorem 1

When (𝒩,)(\mathcal{N},\mathcal{I}) is a matroid, the 𝖳𝗐𝗂𝗇𝖦𝗋𝖾𝖾𝖽𝗒\mathsf{TwinGreedy} algorithm returns a solution SS^{*} with 14\frac{1}{4} approximation ratio, under time complexity of 𝒪(nr)\mathcal{O}(nr).

Proof:  If S1=S_{1}=\emptyset or S2=S_{2}=\emptyset, then we get an optimal solution, which is shown in the supplementary file. So we assume S1S_{1}\neq\emptyset and S2S_{2}\neq\emptyset. Let O5=O\(S1S2O3)O_{5}=O\backslash(S_{1}\cup S_{2}\cup O_{3}) and O6=O\(S1S2O4)O_{6}=O\backslash(S_{1}\cup S_{2}\cup O_{4}). By Eqn. (1), we get

f(OS1)f(S1)f(O2+S1)+f(O2S1)+f(O3S1)+f(O5S1)\displaystyle f(O\cup S_{1})-f(S_{1})\leq f(O_{2}^{+}\mid S_{1})+f(O_{2}^{-}\mid S_{1})+f(O_{3}\mid S_{1})+f(O_{5}\mid S_{1}) (6)
f(OS2)f(S2)f(O1+S2)+f(O1S2)+f(O4S2)+f(O6S2)\displaystyle f(O\cup S_{2})-f(S_{2})\leq f(O_{1}^{+}\mid S_{2})+f(O_{1}^{-}\mid S_{2})+f(O_{4}\mid S_{2})+f(O_{6}\mid S_{2}) (7)

Using Lemma 2, we can get

f(O2+S1)+f(O2S1)+f(O3S1)+f(O1+S2)+f(O1S2)+f(O4S2)\displaystyle f(O_{2}^{+}\mid S_{1})+f(O_{2}^{-}\mid S_{1})+f(O_{3}\mid S_{1})+f(O_{1}^{+}\mid S_{2})+f(O_{1}^{-}\mid S_{2})+f(O_{4}\mid S_{2})
eO1+O2O3δ(π1(e))+eO1O2+O4δ(π2(e))\displaystyle\leq\sum\nolimits_{e\in O_{1}^{+}\cup O_{2}^{-}\cup O_{3}}\delta(\pi_{1}(e))+\sum\nolimits_{e\in O_{1}^{-}\cup O_{2}^{+}\cup O_{4}}\delta(\pi_{2}(e))
eS1δ(e)+eS2δ(e)f(S1)+f(S2),\displaystyle\leq\sum\nolimits_{e\in S_{1}}\delta(e)+\sum\nolimits_{e\in S_{2}}\delta(e)\leq f(S_{1})+f(S_{2}), (8)

where the second inequality is due to that π1:O1+O1O2O3S1\pi_{1}:O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}\mapsto S_{1} and π2:O1O2+O2O4S2\pi_{2}:O_{1}^{-}\cup O_{2}^{+}\cup O_{2}^{-}\cup O_{4}\mapsto S_{2} are both injective functions shown in Lemma 1, and that δ(e)>0\delta(e)>0 for every eS1S2e\in S_{1}\cup S_{2} according to Line 1 of TwinGreedy. Besides, according to the definition of O5O_{5}, we must have f(eS1)0f(e\mid S_{1})\leq 0 for each eO5e\in O_{5}, because otherwise ee should be added into S1S2S_{1}\cup S_{2} as S1{e}S_{1}\cup\{e\}\in\mathcal{I}. Similarly, we get f(eS2)0f(e\mid S_{2})\leq 0 for each eO6e\in O_{6}. Therefore, we have

f(O5S1)eO5f(eS1)0;f(O6S2)eO6f(eS2)0\displaystyle f(O_{5}\mid S_{1})\leq\sum\nolimits_{e\in O_{5}}f(e\mid S_{1})\leq 0;~{}~{}f(O_{6}\mid S_{2})\leq\sum\nolimits_{e\in O_{6}}f(e\mid S_{2})\leq 0 (9)

Meanwhile, as f()f(\cdot) is a non-negative submodular function and S1S2=S_{1}\cap S_{2}=\emptyset, we have

f(O)f(O)+f(OS1S2)f(OS1)+f(OS2)\displaystyle f(O)\leq f(O)+f(O\cup S_{1}\cup S_{2})\leq f(O\cup S_{1})+f(O\cup S_{2}) (10)

By summing up Eqn. (6)-Eqn. (10) and simplifying, we get

f(O)2f(S1)+2f(S2)4f(S)\displaystyle f(O)\leq 2f(S_{1})+2f(S_{2})\leq 4f(S^{*}) (11)

which completes the proof on the 1/41/4 ratio. Finally, the 𝒪(nr)\mathcal{O}(nr) time complexity is evident, as |S1|+|S2|2r|S_{1}|+|S_{2}|\leq 2r and adding one element into S1S_{1} or S2S_{2} has at most 𝒪(n)\mathcal{O}(n) time complexity. \square

Interestingly, when the objective function f()f(\cdot) is monotone, the proof of Theorem 1 shows that the TwinGreedy algorithm achieves a 1/21/2 approximation ratio due to f(OS1)+f(OS2)2f(O)f(O\cup S_{1})+f(O\cup S_{2})\geq 2f(O). Therefore, we rediscover the 1/21/2 deterministic ratio proposed by Fisher et al. [27] for monotone submodular maximization over a matroid.

4 The TwinGreedyFast Algorithm

As the TwinGreedy algorithm still has quadratic running time, we present a more efficient algorithm TwinGreedyFast (Alg. 2). As that in TwinGreedy, the TwinGreedyFast algorithm also maintains two solution sets S1S_{1} and S2S_{2}, but it uses a threshold to control the quality of elements added into S1S_{1} or S2S_{2}. More specifically, given a threshold τ\tau, TwinGreedyFast checks every unselected element e𝒩\(S1S2)e\in\mathcal{N}\backslash(S_{1}\cup S_{2}) in an arbitrarily order. Then it chooses Si{S1,S2}S_{i}\in\{S_{1},S_{2}\} such that adding ee into SiS_{i} does not violate the feasibility of \mathcal{I} while f(eSi)f(e\mid S_{i}) is maximized. If the marginal gain f(eSi)f(e\mid S_{i}) is no less than τ\tau, then ee is added into SiS_{i}, otherwise the algorithm simply neglects ee. The algorithm repeats the above process starting from τ=τmax\tau=\tau_{max} and decreases τ\tau by a factor (1+ϵ)(1+\epsilon) at each iteration until τ\tau is sufficiently small, then it returns the one between S1S_{1} and S2S_{2} with the larger objective value.

1 τmaxmax{f(e):e𝒩{e}}\tau_{max}\leftarrow\max\{f(e):e\in\mathcal{N}\wedge\{e\}\in\mathcal{I}\};
2 S1;S2;S_{1}\leftarrow\emptyset;S_{2}\leftarrow\emptyset;
3
4for (ττmax;τ>ϵτmax/[r(1+ϵ)];ττ/(1+ϵ))\mathrm{(}\tau\leftarrow\tau_{max};~{}~{}\tau>{\epsilon\tau_{max}}/{[r(1+\epsilon)]};~{}~{}\tau\leftarrow\tau/(1+\epsilon)\mathrm{)} do
5       foreach e𝒩\(S1S2)e\in\mathcal{N}\backslash(S_{1}\cup S_{2}) do
6             Δ1;Δ2\Delta_{1}\leftarrow-\infty;\Delta_{2}\leftarrow-\infty     /*two signals*/
7             if S1{e}S_{1}\cup\{e\}\in\mathcal{I} then  Δ1f(eS1)\Delta_{1}\leftarrow f(e\mid S_{1}) ;
8             if S2{e}S_{2}\cup\{e\}\in\mathcal{I} then  Δ2f(eS2)\Delta_{2}\leftarrow f(e\mid S_{2}) ;
9             iargmaxj{1,2}Δji\leftarrow\arg\max_{j\in\{1,2\}}\Delta_{j}; (ties broken arbitrarily)
10             if Δiτ\Delta_{i}\geq\tau then  SiSi{e}S_{i}\leftarrow S_{i}\cup\{e\} ;
11            
12      
13SargmaxX{S1,S2}f(X)S^{*}\leftarrow\arg\max_{X\in\{S_{1},S_{2}\}}f(X)
return SS^{*}
Algorithm 2 𝖳𝗐𝗂𝗇𝖦𝗋𝖾𝖾𝖽𝗒𝖥𝖺𝗌𝗍(𝒩,,f(),ϵ)\mathsf{TwinGreedyFast}(\mathcal{N},\mathcal{I},f(\cdot),\epsilon)

The performance analysis of TwinGreedyFast is similar to that of TwinGreedy. Let us consider the two solution sets S1S_{1} and S2S_{2} when TwinGreedyFast returns. We can define O1+O_{1}^{+}, O1O_{1}^{-}, O2+O_{2}^{+}, O2O_{2}^{-}, O3O_{3}, O4O_{4}, Pre(e,Si)\mathrm{Pre}(e,S_{i}) and δ(e)\delta(e) in exact same way as Definition 1, and Lemma 1 still holds as the construction of π1()\pi_{1}(\cdot) and π2()\pi_{2}(\cdot) only depends on the insertion order of the elements in S1S2S_{1}\cup S_{2}, which is fixed when TwinGreedyFast returns. We also try to bound the marginal gains of O1+O_{1}^{+}-O4O_{4} with respect to S1S_{1} and S2S_{2} in a way similar to Lemma 2. However, due to the thresholds introduced in TwinGreedyFast, these bounds are slightly different from those in Lemma 2, as shown in Lemma 3:

Lemma 3

The TwinGreedyFast algorithm satisfies:

f(O1+S2)eO1+δ(π1(e));\displaystyle f(O_{1}^{+}\mid S_{2})\leq\sum\nolimits_{e\in O_{1}^{+}}\delta(\pi_{1}(e));~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{} f(O2+S1)eO2+δ(π2(e))\displaystyle f(O_{2}^{+}\mid S_{1})\leq\sum\nolimits_{e\in O_{2}^{+}}\delta(\pi_{2}(e)) (12)
f(O1S2)(1+ϵ)eO1δ(π2(e));\displaystyle f(O_{1}^{-}\mid S_{2})\leq(1+\epsilon)\sum\nolimits_{e\in O_{1}^{-}}\delta(\pi_{2}(e)); f(O2S1)(1+ϵ)eO2δ(π1(e))\displaystyle f(O_{2}^{-}\mid S_{1})\leq(1+\epsilon)\sum\nolimits_{e\in O_{2}^{-}}\delta(\pi_{1}(e)) (13)
f(O4S2)(1+ϵ)eO4δ(π2(e));\displaystyle f(O_{4}\mid S_{2})\leq(1+\epsilon)\sum\nolimits_{e\in O_{4}}\delta(\pi_{2}(e)); f(O3S1)(1+ϵ)eO3δ(π1(e))\displaystyle f(O_{3}\mid S_{1})\leq(1+\epsilon)\sum\nolimits_{e\in O_{3}}\delta(\pi_{1}(e)) (14)

where π1()\pi_{1}(\cdot) and π2()\pi_{2}(\cdot) are the two functions defined in Lemma 1.

Proof:  (sketch) At the moment that any eO1+e\in O_{1}^{+} is added into S1S_{1}, it can also be added into S2S_{2} due to Definition 1. Therefore, we must have f(ePre(e,S2))δ(e)f(e\mid\mathrm{Pre}(e,S_{2}))\leq\delta(e) according to the greedy selection rule of TwinGreedyFast. Using submodularity, we get the first inequality in the lemma:

f(O1+S2)eO1+f(eS2)eO1+f(ePre(e,S2))eO1+δ(π1(e))\displaystyle f(O_{1}^{+}\mid S_{2})\leq\sum\nolimits_{e\in O_{1}^{+}}f(e\mid S_{2})\leq\sum\nolimits_{e\in O_{1}^{+}}f(e\mid\mathrm{Pre}(e,S_{2}))\leq\sum\nolimits_{e\in O_{1}^{+}}\delta(\pi_{1}(e))

For any eO1e\in O_{1}^{-}, consider the moment that TwinGreedyFast adds π2(e)\pi_{2}(e) into S2S_{2}. According to Lemma 1, we have Pre(π2(e),S2){e}\mathrm{Pre}(\pi_{2}(e),S_{2})\cup\{e\}\in\mathcal{I}. This implies that ee has not been added into S1S_{1} yet, because otherwise we have Pre(e,S2)Pre(π2(e),S2)\mathrm{Pre}(e,S_{2})\subseteq\mathrm{Pre}(\pi_{2}(e),S_{2}) and hence Pre(e,S2){e}\mathrm{Pre}(e,S_{2})\cup\{e\}\in\mathcal{I} according to the hereditary property of independence systems, which contradicts eO1e\in O_{1}^{-}. As such, we must have δ(π2(e))τ\delta(\pi_{2}(e))\geq\tau and f(ePre(π2(e),S2))(1+ϵ)τf(e\mid\mathrm{Pre}(\pi_{2}(e),S_{2}))\leq{(1+\epsilon)\tau} where τ\tau is the threshold used by TwinGreedyFast when adding π2(e)\pi_{2}(e), because otherwise ee should have been added before π2(e)\pi_{2}(e) in an earlier stage of the algorithm with a larger threshold. Combining these results gives us

f(O1S2)eO1f(eS2)eO1f(ePre(π2(e),S2))(1+ϵ)eO1δ(π2(e))\displaystyle f(O_{1}^{-}\mid S_{2})\leq\sum\nolimits_{e\in O_{1}^{-}}f(e\mid S_{2})\leq\sum\nolimits_{e\in O_{1}^{-}}f(e\mid\mathrm{Pre}(\pi_{2}(e),S_{2}))\leq(1+\epsilon)\sum\nolimits_{e\in O_{1}^{-}}\delta(\pi_{2}(e))

The other inequalities in the lemma can be proved similarly. \square

With Lemma 3, we can use similar reasoning as that in Theorem 1 to prove the performance bounds of TwinGreedyFast, as shown in Theorem 2. The full proofs of Lemma 3 and Theorem 2 can be found in the supplementary file.

Theorem 2

When (𝒩,)(\mathcal{N},\mathcal{I}) is a matroid, the 𝖳𝗐𝗂𝗇𝖦𝗋𝖾𝖾𝖽𝗒𝖥𝖺𝗌𝗍\mathsf{TwinGreedyFast} algorithm returns a solution SS^{*} with 14ϵ\frac{1}{4}-\epsilon approximation ratio, under time complexity of 𝒪(nϵlogrϵ)\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon}).

Extensions:

The TwinGreedyFast algorithm can also be directly used to address the problem of non-monotone submodular maximization subject to a pp-set system constraint (by simply inputting a pp-set system (𝒩,)(\mathcal{N},\mathcal{I}) into TwinGreedyFast). It achieves a 12p+2ϵ\frac{1}{2p+2}-\epsilon deterministic ratio under 𝒪(nϵlogrϵ)\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon}) time complexity in such a case, which improves upon both the ratio and time complexity of the results in [44, 31]. We note that the prior best known result for this problem is a (1p+2p+3ϵ)\left(\frac{1}{p+2\sqrt{p}+3}-\epsilon\right)-approximation under 𝒪((nr+r/ϵ)p)\mathcal{O}((nr+r/\epsilon)\sqrt{p}) time complexity proposed in [25]. Compared to this best known result, TwinGreedyFast achieves much smaller time complexity, and also has a better approximation ratio when p5p\leq 5. The proof for this performance ratio of TwinGreedyFast (shown in the supplementary file) is almost the same as those for Theorems 1-2, as we only need to relax Lemma 1 to allow that the preimage by π1()\pi_{1}(\cdot) or π2()\pi_{2}(\cdot) of any element in S1S2S_{1}\cup S_{2} contains at most pp elements.

5 Performance Evaluation

In this section, we evaluate the performance of TwinGreedyFast under two social network applications, using both synthetic and real networks. We implement the following algorithms for comparison:

  1. 1.

    SampleGreedy: This algorithm is proposed in [25] which has 1/41/4 expected ratio and 𝒪(nr)\mathcal{O}(nr) running time. To the best of our knowledge, it is currently the fastest randomized algorithm for non-monotone submodular maximization over a matroid.

  2. 2.

    Fantom: Excluding the 1/4ϵ1/4-\epsilon ratio proposed in [38], the Fantom algorithm from [44] has the best deterministic ratio for our problem. As it needs to call an unconstrained submodular maximization (USM) algorithm, we use the USM algorithm proposed in [12] with 1/31/3 deterministic ratio and linear running time. As such, Fantom achieves 1/71/7 deterministic ratio in the experiments.222We have also tested Fantom using the randomized USM algorithm in [12] with 1/21/2 expected ratio. The experimental results are almost identical, although Fantom has a larger 1/6{1}/{6} ratio (in expectation) in such a case.

  3. 3.

    ResidualRandomGreedy: This algorithm is from [11] which has 1/41/4 expected ratio and 𝒪(nrlogn)\mathcal{O}(nr\log n) running time. We denote it as “RRG” for brevity.

  4. 4.

    TwinGreedyFast: We implement our Algorithm 2 by setting ϵ=0.1\epsilon=0.1. As such, it achieves 0.15 deterministic approximation ratio in the experiments.

5.1 Applications

Given a social network G=(V,E)G=(V,E) where VV is the set of nodes and EE is the set of edges, we consider the following two applications in our experiments. Both applications are instances of non-monotone submodular maximization subject to a matroid constraint, which is proved in the supplementary file.

  1. 1.

    Social Network Monitoring: This application is similar to those applications considered in [40] and [36]. Suppose that each edge (u,v)E(u,v)\in E is associated with a weight w(u,v)w(u,v) denoting the maximum amount of content that can be propagated through (u,v)(u,v). Moreover, VV is partitioned into hh disjoint subsets V1,V2,,VhV_{1},V_{2},\cdots,V_{h} according to the users’ properties such as ages and political leanings. We need to select a set of users SVS\subseteq V to monitor the network, such that the total amount of monitored content f(S)=(u,v)E,uS,vSw(u,v)f(S)=\sum_{(u,v)\in E,u\in S,v\notin S}w(u,v) is maximized. Due to the considerations on diversity or fairness, we also require i[h]:|SVi|k\forall i\in[h]:|S\cap V_{i}|\leq k, where kk is a predefined constant.

  2. 2.

    Multi-Product Viral Marketing: This application is a variation of the one considered in [18]. Suppose that a company with a budget BB needs to select a set of at most kk seed nodes to promote mm products. Each node uVu\in V can be selected as a seed for at most one product and also has a cost c(u)c(u) for serving as a seed. The goal is to maximize the revenue (with budget-saving considerations): i[m]fi(Si)+(Bi[m]vSic(v))\sum\nolimits_{i\in[m]}f_{i}(S_{i})+\left(B-\sum_{i\in[m]}\sum\nolimits_{v\in S_{i}}c(v)\right), where SiS_{i} is the set of seed nodes selected for product ii, and fi()f_{i}(\cdot) is a monotone and submodular influence spread function as that proposed in [34]. We also assume that BB is large enough to keep the revenue non-negative, and assume that selecting no seed nodes would cause zero revenue.

Refer to caption
Figure 2: Experimental results for social network monitoring and multi-product viral marketing, where “TGF” and “SG” are abbreviations for “TwinGreedyFast” and “SampleGreedy”, respectively.

5.2 Experimental Results

The experimental results are shown in Fig. 2. In overview, TwinGreedyFast runs more than an order of magnitude faster than the other three algorithms; Fantom has the best performance on utility; the performance of TwinGreedyFast on utility is close to that of Fantom.

5.2.1 Social Network Monitoring

We generate an Erdős–Rényi (ER) random graph with 3000 nodes and set the edge probability to 0.50.5. The weight w(u,v)w(u,v) for each (u,v)E(u,v)\in E is generated uniformly at random from [0,1][0,1] and all nodes are randomly assigned to five groups. In the supplementary file, we also generate Barabási–Albert (BA) graphs for comparison, and the results are qualitatively similar. It can be seen from Fig. 2(a) that Fantom incurs the largest number of queries to objective function, as it leverages repeated greedy searching processes to find a solution with good quality. SampleGreedy runs faster than RRG, which coincides with their time complexity mentioned in Section 1.1. TwinGreedyFast significantly outperforms Fantom, RRG and SampleGreedy by more than an order of magnitude in Fig. 2(a) and Fig. 2(b), as it achieves nearly linear time complexity. Moreover, it can be seen from Figs. 2(a)-(b) that TwinGreedyFast maintains its advantage on efficiency no matter the metric is wall-clock running time or number of queries. Finally, Fig. 2(c) shows that all the implemented algorithms achieve approximately the same utility on the ER random graph.

5.2.2 Multi-Product Viral Marketing

We use two real social networks Flixster and Epinions. Flixster is from [6] with 137925 nodes and 2538746 edges, and Epinions is from the SNAP dataset collection [39] with 75879 nodes and 508837 edges. We consider three products and follow the Independent Cascade (IC) model [34] to set the influence spread function fi()f_{i}(\cdot) for each product ii. Note that the IC model requires an activation probability pu,vp_{u,v} associated with each edge (u,v)E(u,v)\in E. This probability is available in the Flixster dataset (learned from real users’ action logs), and we follow Chen et al. [17] to set pu,v=1/|Nin(v)|p_{u,v}=1/|N_{in}(v)| for the Epinions dataset, where Nin(v)N_{in}(v) is the set of in-neighbors of vv. As evaluating fi()f_{i}(\cdot) under the IC model is NP-hard, we follow the approach in [7] to generate a set of one million random Reverse-Reachable Sets (RR-sets) such that fi()f_{i}(\cdot) can be approximately evaluated using these RR-sets. The cost c(u)c(u) of each node is generated uniformly at random from [0,1][0,1], and we set B=muVc(u)B=m\sum_{u\in V}c(u) to keep the function value non-negative. More implementation details can be found in the supplementary file.

We plot the experimental results on viral marketing in Fig. 2(d)-Fig. 2(i). The results are qualitatively similar to those in Fig. 2(a)-Fig. 2(c), which show that TwinGreedyFast is still significantly faster than the other algorithms. Fantom achieves the largest utility again, while TwinGreedyFast performs closely to Fantom and outperforms the other two randomized algorithms on utility.

6 Conclusion and Discussion

We have proposed the first deterministic algorithm to achieve an approximation ratio of 1/41/4 for maximizing a non-monotone, non-negative submodular function subject to a matroid constraint, and our algorithm can also be accelerated to achieve nearly-linear running time. In contrast to the existing algorithms adopting the “repeated greedy-search” framework proposed by [31], our algorithms are designed based on a novel “simultaneous greedy-search” framework, where two candidate solutions are constructed simultaneously, and a pair of an element and a candidate solution is greedily selected at each step to maximize the marginal gain. Moreover, our algoirthms can also be directly used to handle a more general pp-set system constraint or monotone submodular functions, while still achieving nice performance bounds. For example, by similar reasoning with that in Sections 3-4, it can be seen that our algorithms can achieve 1p+1\frac{1}{p+1} ratio for monotone f()f(\cdot) under a pp-set system constraint, which is almost best possible [2]. We have evaluated the performance of our algorithms in two concrete applications for social network monitoring and multi-product viral marketing, and the extensive experimental results demonstrate that our algorithms runs in orders of magnitude faster than the state-of-the-art algorithms, while achieving approximately the same utility.

Acknowledgements

This work was supported by the National Key R&\&D Program of China under Grant No. 2018AAA0101204, the National Natural Science Foundation of China (NSFC) under Grant No. 61772491 and Grant No. U1709217, Anhui Initiative in Quantum Information Technologies under Grant No. AHY150300, and the Fundamental Research Funds for the Central Universities.

Broader Impact

Submodular optimization is an important research topic in data mining, machine learning and optimization theory, as it has numerous applications such as crowdsourcing [47], viral marketing [34], feature selection [28], network monitoring [40], document summarization [41, 20], online advertising [49], crowd teaching [46] and blogosphere mining [21]. Matroid is a fundamental structure in combinatorics that captures the essence of a notion of "independence" that generalizes linear independence in vector spaces. The matroid structure has been pervasively found in various areas such as geometry, network theory, coding theory and graph theory [8]. The study on submodular maximization subject to matroid constraints dates back to the 1970s (e.g., [27]), and it is still a hot research topic today [44, 25, 45, 5, 16, 3]. A lot of practical problems can be cast as the problem of submodular maximziation over a matroid constraint (or more general pp-set system constraints), such as diversity maximization [1], video summarization [52], clustering [42], multi-robot allocation [51] and planning sensor networks [19]. Therefore, our study has addressed a general and fundamental theoretical problem with many potential applications.

Due to the massive datasets used everywhere nowadays, it is very important that submodular optimization algorithms should achieve accuracy and efficiency simultaneously. Recently, there emerge great interests on designing more practical and efficient algorithms for submodular optimization (e.g., [2, 13, 4, 43, 36, 23, 11]), and our work advances the state of the art in this area by proposing a new efficient algorithm with improved performance bounds. Moreover, our algorithms are based on a novel “simultaneous greedy-search” framework, which is different from the classical “repeated greedy-search” and “local search” frameworks adopted by the state-of-the-art algorithms (e.g., [31, 38, 25, 44]). We believe that our “simultaneous greedy-search” framework has the potential to be extended to address other problems on submodular maximization with more complex constraints, which is the topic of our ongoing research.

References

  • Abbassi et al. [2013] Z. Abbassi, V. S. Mirrokni, and M. Thakur. Diversity maximization under matroid constraints. In KDD, pages 32–40, 2013.
  • Badanidiyuru and Vondrák [2014] A. Badanidiyuru and J. Vondrák. Fast algorithms for maximizing submodular functions. In SODA, pages 1497–1514, 2014.
  • Balcan and Harvey [2018] M.-F. Balcan and N. J. Harvey. Submodular functions: Learnability, structure, and optimization. SIAM Journal on Computing, 47(3):703–754, 2018.
  • Balkanski et al. [2018] E. Balkanski, A. Breuer, and Y. Singer. Non-monotone submodular maximization in exponentially fewer iterations. In NIPS, pages 2353–2364, 2018.
  • Balkanski et al. [2019] E. Balkanski, A. Rubinstein, and Y. Singer. An optimal approximation for submodular maximization under a matroid constraint in the adaptive complexity model. In STOC, pages 66–77, 2019.
  • Barbieri et al. [2012] N. Barbieri, F. Bonchi, and G. Manco. Topic-aware social influence propagation models. In ICDM, pages 81–90, 2012.
  • Borgs et al. [2014] C. Borgs, M. Brautbar, J. Chayes, and B. Lucier. Maximizing social influence in nearly optimal time. In SODA, pages 946–957, 2014.
  • Bryant and Perfect [1980] V. Bryant and H. Perfect. Independence Theory in Combinatorics: An Introductory Account with Applications to Graphs and Transversals. Springer, 1980.
  • Buchbinder and Feldman [2018] N. Buchbinder and M. Feldman. Deterministic algorithms for submodular maximization problems. ACM Transactions on Algorithms, 14(3):1–20, 2018.
  • Buchbinder and Feldman [2019] N. Buchbinder and M. Feldman. Constrained submodular maximization via a nonsymmetric technique. Mathematics of Operations Research, 44(3):988–1005, 2019.
  • Buchbinder et al. [2014] N. Buchbinder, M. Feldman, J. Naor, and R. Schwartz. Submodular maximization with cardinality constraints. In SODA, pages 1433–1452, 2014.
  • Buchbinder et al. [2015] N. Buchbinder, M. Feldman, J. Seffi, and R. Schwartz. A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM Journal on Computing, 44(5):1384–1402, 2015.
  • Buchbinder et al. [2017] N. Buchbinder, M. Feldman, and R. Schwartz. Comparing apples and oranges: Query trade-off in submodular maximization. Mathematics of Operations Research, 42(2):308–329, 2017.
  • Buchbinder et al. [2019] N. Buchbinder, M. Feldman, and M. Garg. Deterministic (1/21/2+ ε\varepsilon)-approximation for submodular maximization over a matroid. In SODA, pages 241–254, 2019.
  • Calinescu et al. [2011] G. Calinescu, C. Chekuri, M. Pal, and J. Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766, 2011.
  • Chekuri and Quanrud [2019] C. Chekuri and K. Quanrud. Parallelizing greedy for submodular set function maximization in matroids and beyond. In STOC, pages 78–89, 2019.
  • Chen et al. [2009] W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD, pages 199–208, 2009.
  • Chen et al. [2020] W. Chen, W. Zhang, and H. Zhao. Gradient method for continuous influence maximization with budget-saving considerations. In AAAI, 2020.
  • Corah and Michael [2018] M. Corah and N. Michael. Distributed submodular maximization on partition matroids for planning on large sensor networks. In CDC, pages 6792–6799, 2018.
  • El-Arini and Guestrin [2011] K. El-Arini and C. Guestrin. Beyond keyword search: Discovering relevant scientific literature. In KDD, pages 439–447, 2011.
  • El-Arini et al. [2009] K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD, pages 289–298, 2009.
  • Ene and Nguyen [2019] A. Ene and H. L. Nguyen. Towards nearly-linear time algorithms for submodular maximization with a matroid constraint. In ICALP, page 54:1–54:14, 2019.
  • Fahrbach et al. [2019] M. Fahrbach, V. Mirrokni, and M. Zadimoghaddam. Non-monotone submodular maximization with nearly optimal adaptivity and query complexity. In ICML, pages 1833–1842, 2019.
  • Feldman et al. [2011] M. Feldman, J. Naor, and R. Schwartz. A unified continuous greedy algorithm for submodular maximization. In FOCS, pages 570–579, 2011.
  • Feldman et al. [2017] M. Feldman, C. Harshaw, and A. Karbasi. Greed is good: Near-optimal submodular maximization via greedy optimization. In COLT, pages 758–784, 2017.
  • Feldman et al. [2018] M. Feldman, A. Karbasi, and E. Kazemi. Do less, get more: Streaming submodular maximization with subsampling. In NIPS, pages 732–742, 2018.
  • Fisher et al. [1978] M. Fisher, G. Nemhauser, and L. Wolsey. An analysis of approximations for maximizing submodular set functions—ii. Mathematical Programming Study, 8:73–87, 1978.
  • Fujii and Sakaue [2019] K. Fujii and S. Sakaue. Beyond adaptive submodularity: Approximation guarantees of greedy policy with adaptive submodularity ratio. In ICML, pages 2042–2051, 2019.
  • Gharan and Vondrák [2011] S. O. Gharan and J. Vondrák. Submodular maximization by simulated annealing. In SODA, pages 1098–1116, 2011.
  • Gomes and Krause [2010] R. Gomes and A. Krause. Budgeted nonparametric learning from data streams. In ICML, page 391–398, 2010.
  • Gupta et al. [2010] A. Gupta, A. Roth, G. Schoenebeck, and K. Talwar. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In WINE, pages 246–257, 2010.
  • Haba et al. [2020] R. Haba, E. Kazemi, M. Feldman, and A. Karbasi. Streaming submodular maximization under a kk-set system constraint. In ICML, (arXiv:2002.03352), 2020.
  • Iyer and Bilmes [2013] R. K. Iyer and J. A. Bilmes. Submodular optimization with submodular cover and submodular knapsack constraints. In NIPS, pages 2436–2444, 2013.
  • Kempe et al. [2003] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137–146, 2003.
  • Krause and Guestrin [2011] A. Krause and C. Guestrin. Submodularity and its applications in optimized information gathering. ACM Transactions on Intelligent Systems and Technology, 2(4):1–20, 2011.
  • Kuhnle [2019] A. Kuhnle. Interlaced greedy algorithm for maximization of submodular functions in nearly linear time. In NIPS, pages 2371–2381, 2019.
  • Kuhnle et al. [2018] A. Kuhnle, J. D. Smith, V. Crawford, and M. Thai. Fast maximization of non-submodular, monotonic functions on the integer lattice. In ICML, pages 2786–2795, 2018.
  • Lee et al. [2010] J. Lee, V. S. Mirrokni, V. Nagarajan, and M. Sviridenko. Maximizing nonmonotone submodular functions under matroid or knapsack constraints. SIAM Journal on Discrete Mathematics, 23(4):2053–2078, 2010.
  • Leskovec and Krevl [2014] J. Leskovec and A. Krevl. Snap datasets: Stanford large network dataset collection. URL: http://snap.stanford.edu/, 2014.
  • Leskovec et al. [2007] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420–429, 2007.
  • Lin and Bilmes [2011] H. Lin and J. Bilmes. A class of submodular functions for document summarization. In ACL/HLT, pages 510–520, 2011.
  • Liu et al. [2014] M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa. Entropy-rate clustering: Cluster analysis via maximizing a submodular function subject to a matroid constraint. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1):99–112, 2014.
  • Mirzasoleiman et al. [2015] B. Mirzasoleiman, A. Badanidiyuru, A. Karbasi, J. Vondrák, and A. Krause. Lazier than lazy greedy. In AAAI, pages 1812–1818, 2015.
  • Mirzasoleiman et al. [2016] B. Mirzasoleiman, A. Badanidiyuru, and A. Karbasi. Fast constrained submodular maximization: Personalized data summarization. In ICML, pages 1358–1367, 2016.
  • Mirzasoleiman et al. [2018] B. Mirzasoleiman, S. Jegelka, and A. Krause. Streaming non-monotone submodular maximization: Personalized video summarization on the fly. In AAAI, pages 1379–1386, 2018.
  • Singla et al. [2014] A. Singla, I. Bogunovic, G. Bartok, A. Karbasi, and A. Krause. Near-optimally teaching the crowd to classify. In ICML, pages 154–162, 2014.
  • Singla et al. [2016] A. Singla, S. Tschiatschek, and A. Krause. Noisy submodular maximization via adaptive sampling with applications to crowdsourced image collection summarization. In AAAI, page 2037–2041, 2016.
  • Soma and Yoshida [2017] T. Soma and Y. Yoshida. Non-monotone dr-submodular function maximization. In AAAI, pages 898–904, 2017.
  • Soma and Yoshida [2018] T. Soma and Y. Yoshida. Maximizing monotone submodular functions over the integer lattice. Mathematical Programming, 172(1-2):539–563, 2018.
  • Vondrák [2013] J. Vondrák. Symmetry and approximability of submodular maximization problems. SIAM Journal on Computing, 42(1):265–304, 2013.
  • Williams et al. [2017] R. K. Williams, A. Gasparri, and G. Ulivi. Decentralized matroid optimization for topology constraints in multi-robot allocation problems. In ICRA, pages 293–300, 2017.
  • Xu et al. [2015] J. Xu, L. Mukherjee, Y. Li, J. Warner, J. M. Rehg, and V. Singh. Gaze-enabled egocentric video summarization via constrained submodular maximization. In CVPR, pages 2235–2244, 2015.

Appendix A: Missing Proofs

A.1 Proof of Lemma 1

Proof:  We only prove the existence of π1()\pi_{1}(\cdot), as the existence of π2()\pi_{2}(\cdot) can be proved in the same way. Suppose that the elements in S1S_{1} are {u1,,us}\{u_{1},\cdots,u_{s}\} (listed according to the order that they are added into S1S_{1}). We use an argument inspired by [15] to construct π1()\pi_{1}(\cdot). Let Ls=O1+O1O2O3L_{s}=O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}. We execute the following iterations from j=sj=s to j=0j=0. At the beginning of the jj-th iteration, we compute a set Aj={xLj\{u1,,uj1}:{u1,,uj1,x}}A_{j}=\{x\in L_{j}\backslash\{u_{1},\cdots,u_{j-1}\}:\{u_{1},\cdots,u_{j-1},x\}\in\mathcal{I}\}. If ujO1+O1u_{j}\in O_{1}^{+}\cup O_{1}^{-} (so ujAju_{j}\in A_{j}), then we set π1(uj)=uj\pi_{1}(u_{j})=u_{j} and Dj={uj}D_{j}=\{u_{j}\}. If ujO1+O1u_{j}\notin O_{1}^{+}\cup O_{1}^{-} and AjA_{j}\neq\emptyset, then we pick an arbitrary eAje\in A_{j} and set π1(e)=uj\pi_{1}(e)=u_{j}; Dj={e}D_{j}=\{e\}. If Aj=A_{j}=\emptyset, then we simply set Dj=D_{j}=\emptyset. After that, we set Lj1=Lj\DjL_{j-1}=L_{j}\backslash D_{j} and enter the (j1)(j-1)-th iteration.

From the above process, it can be easily seen that π1()\pi_{1}(\cdot) has the properties required by the lemma as long as it is a valid function. So we only need to prove that each eO1+O1O2O3e\in O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3} is mapped to an element in S1S_{1}, which is equivalent to prove L0=L_{0}=\emptyset as each eLs\L0e\in L_{s}\backslash L_{0} is mapped to an element in S1S_{1} according to the above process. In the following, we prove L0=L_{0}=\emptyset by induction, i.e., proving |Lj|j|L_{j}|\leq j for all 0js0\leq j\leq s.

We first prove |Ls|s|L_{s}|\leq s. By way of contradiction, let us assume |Ls|=|O1+O1O2O3|>s=|S1||L_{s}|=|O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}|>s=|S_{1}|. Then, there must exist some xO2O3x\in O_{2}^{-}\cup O_{3} satisfying S1{x}S_{1}\cup\{x\}\in\mathcal{I} according to the exchange property of matroids. Moreover, according to the definition of O3O_{3}, we also have xO3x\notin O_{3}, which implies xO2x\in O_{2}^{-}. So we can get Pre(x,S1){x}\mathrm{Pre}(x,S_{1})\cup\{x\}\in\mathcal{I} due to Pre(x,S1)S1\mathrm{Pre}(x,S_{1})\subseteq S_{1}, S1{x}S_{1}\cup\{x\}\in\mathcal{I} and the hereditary property of independence systems, but this contradicts the definition of O2O_{2}^{-}. Therefore, |Lj|j|L_{j}|\leq j holds when j=sj=s.

Now suppose |Lj|j|L_{j}|\leq j for certain jsj\leq s. If AjA_{j}\neq\emptyset, then we have DjD_{j}\neq\emptyset and hence |Lj1|=|Lj|1j1|L_{j-1}|=|L_{j}|-1\leq j-1. If Aj=A_{j}=\emptyset, then we know that there does not exist xLj\{u1,,uj1}x\in L_{j}\backslash\{u_{1},\cdots,u_{j-1}\} such that {u1,,uj1}{x}\{u_{1},\cdots,u_{j-1}\}\cup\{x\}\in\mathcal{I}. This implies |{u1,,uj1}||Lj||\{u_{1},\cdots,u_{j-1}\}|\geq|L_{j}| due to the exchange property of matroids. So we also have |Lj1|=|Lj|j1|L_{j-1}|=|L_{j}|\leq j-1, which completes the proof. \square

A.2 Proof of Lemma 2

For clarity, we decompose Lemma 2 into three lemmas (Lemmas 4-6) and prove each of them.

Lemma 4

The TwinGreedy algorithm satisfies

f(O1+S2)eO1+δ(π1(e));f(O2+S1)eO2+δ(π2(e))\displaystyle f(O_{1}^{+}\mid S_{2})\leq\sum_{e\in O_{1}^{+}}\delta(\pi_{1}(e));~{}~{}f(O_{2}^{+}\mid S_{1})\leq\sum_{e\in O_{2}^{+}}\delta(\pi_{2}(e)) (15)

Proof of Lemma 4:  We only prove the first inequality, as the second one can be proved in the same way. For any eO1+e\in O_{1}^{+}, consider the moment that TwinGreedy inserts ee into S1S_{1}. At that moment, adding ee into S2S_{2} also does not violate the feasibility of \mathcal{I} according to the definition of O1+O_{1}^{+}. Therefore, we must have f(ePre(e,S2))δ(e)f(e\mid\mathrm{Pre}(e,S_{2}))\leq\delta(e), because otherwise ee would not be inserted into S1S_{1} according to the greedy rule of TwinGreedy. Using submodularity and the fact that π1(e)=e\pi_{1}(e)=e, we get

f(eS2)f(ePre(e,S2))δ(e)=δ(π1(e)),eO1+\displaystyle f(e\mid S_{2})\leq f(e\mid\mathrm{Pre}(e,S_{2}))\leq\delta(e)=\delta(\pi_{1}(e)),~{}~{}~{}\forall e\in O_{1}^{+} (16)

and hence

eO1+f(O1+S2)eO1+f(eS2)eO1+δ(π1(e)),\displaystyle\sum_{e\in O_{1}^{+}}f(O_{1}^{+}\mid S_{2})\leq\sum_{e\in O_{1}^{+}}f(e\mid S_{2})\leq\sum_{e\in O_{1}^{+}}\delta(\pi_{1}(e)), (17)

which completes the proof. \square

Lemma 5

The TwinGreedy algorithm satisfies

f(O1S2)eO1δ(π2(e));f(O2S1)eO2δ(π1(e))\displaystyle f(O_{1}^{-}\mid S_{2})\leq\sum_{e\in O_{1}^{-}}\delta(\pi_{2}(e));~{}~{}~{}~{}~{}~{}f(O_{2}^{-}\mid S_{1})\leq\sum_{e\in O_{2}^{-}}\delta(\pi_{1}(e)) (18)

Proof of Lemma 5:  We only prove the first inequality, as the second one can be proved in the same way. For any eO1e\in O_{1}^{-}, consider the moment that TwinGreedy inserts π2(e)\pi_{2}(e) into S2S_{2}. At that moment, adding ee into S2S_{2} also does not violate the feasibility of \mathcal{I} as Pre(π2(e),S2){e}\mathrm{Pre}(\pi_{2}(e),S_{2})\cup\{e\}\in\mathcal{I} according to Lemma 1. This implies that ee has not been inserted into S1S_{1} yet. To see this, let us assume (by way of contradiction) that ee has already been added into S1S_{1} when TwinGreedy inserts π2(e)\pi_{2}(e) into S2S_{2}. So we have Pre(e,S2)Pre(π2(e),S2)\mathrm{Pre}(e,S_{2})\subseteq\mathrm{Pre}(\pi_{2}(e),S_{2}). As Pre(π2(e),S2){e}\mathrm{Pre}(\pi_{2}(e),S_{2})\cup\{e\}\in\mathcal{I}, we must have Pre(e,S2){e}\mathrm{Pre}(e,S_{2})\cup\{e\}\in\mathcal{I} due to the hereditary property of independence systems. However, this contradicts Pre(e,S2){e}\mathrm{Pre}(e,S_{2})\cup\{e\}\notin\mathcal{I} as eO1e\in O_{1}^{-}.

As ee has not been inserted into S1S_{1} yet at the moment that π2(e)\pi_{2}(e) is inserted into S2S_{2}, and Pre(π2(e),S2){e}\mathrm{Pre}(\pi_{2}(e),S_{2})\cup\{e\}\in\mathcal{I}, we know that ee must be a contender to π2(e)\pi_{2}(e) when TwinGreedy inserts π2(e)\pi_{2}(e) into S2S_{2}. Due to the greedy selection rule of the algorithm, this means δ(π2(e))=f(π2(e)Pre(π2(e),S2))f(ePre(π2(e),S2))\delta(\pi_{2}(e))=f(\pi_{2}(e)\mid\mathrm{Pre}(\pi_{2}(e),S_{2}))\geq f(e\mid\mathrm{Pre}(\pi_{2}(e),S_{2})). As Pre(π2(e),S2)S2\mathrm{Pre}(\pi_{2}(e),S_{2})\subseteq S_{2}, we also have f(ePre(π2(e),S2))f(eS2)f(e\mid\mathrm{Pre}(\pi_{2}(e),S_{2}))\geq f(e\mid S_{2}). Putting these together, we have

f(O1S2)eO1f(eS2)eO1f(ePre(π2(e),S2))eO1δ(π2(e))\displaystyle f(O_{1}^{-}\mid S_{2})\leq\sum_{e\in O_{1}^{-}}f(e\mid S_{2})\leq\sum_{e\in O_{1}^{-}}f(e\mid\mathrm{Pre}(\pi_{2}(e),S_{2}))\leq\sum_{e\in O_{1}^{-}}\delta(\pi_{2}(e)) (19)

which completes the proof. \square

Lemma 6

The TwinGreedy algorithm satisfies

f(O3S1)eO3δ(π1(e));f(O4S2)eO4δ(π2(e))\displaystyle f(O_{3}\mid S_{1})\leq\sum_{e\in O_{3}}\delta(\pi_{1}(e));~{}~{}f(O_{4}\mid S_{2})\leq\sum_{e\in O_{4}}\delta(\pi_{2}(e)) (20)

Proof of Lemma 6:  We only prove the first inequality, as the second one can be proved in the same way. Consider any eO3e\in O_{3}. According to Lemma 1, we have Pre(π1(e),S1){e}\mathrm{Pre}(\pi_{1}(e),S_{1})\cup\{e\}\in\mathcal{I}, which means that ee can be added into S1S_{1} without violating the feasibility of \mathcal{I} when π1(e)\pi_{1}(e) is added into S1S_{1}. According to the greedy rule of TwinGreedy and submodularity, we must have δ(π1(e))=f(π1(e)Pre(π1(e),S1))f(ePre(π1(e),S1))\delta(\pi_{1}(e))=f(\pi_{1}(e)\mid\mathrm{Pre}(\pi_{1}(e),S_{1}))\geq f(e\mid\mathrm{Pre}(\pi_{1}(e),S_{1})), because otherwise ee should be added into S1S_{1}, which contradicts eO3e\in O_{3}. Therefore, we get

f(O3S1)eO3f(eS1)eO3f(ePre(π1(e),S1))eO3δ(π1(e))\displaystyle f(O_{3}\mid S_{1})\leq\sum_{e\in O_{3}}f(e\mid S_{1})\leq\sum_{e\in O_{3}}f(e\mid\mathrm{Pre}(\pi_{1}(e),S_{1}))\leq\sum_{e\in O_{3}}\delta(\pi_{1}(e)) (21)

which completes the proof. \square

A.3 Proof of Theorem 1

Proof:  We only consider the special case that S1S_{1} or S2S_{2} is empty, as the main proof of the theorem has been presented in the paper. Without loss of generality, we assume that S2S_{2} is empty. According to the greedy rule of the algorithm, we have

f(OS1)eOS1f(e)eOS1δ(e)eS1δ(e)=f(S1)\displaystyle f(O\cap S_{1}\mid\emptyset)\leq\sum_{e\in O\cap S_{1}}f(e\mid\emptyset)\leq\sum_{e\in O\cap S_{1}}\delta(e)\leq\sum_{e\in S_{1}}\delta(e)=f(S_{1}\mid\emptyset) (22)

and f(O\S1)eO\S1f(e)0f(O\backslash S_{1}\mid\emptyset)\leq\sum_{e\in O\backslash S_{1}}f(e\mid\emptyset)\leq 0. Combining these with

f(O\S1)+f(OS1)f(O)+f(),\displaystyle f(O\backslash S_{1})+f(O\cap S_{1})\geq f(O)+f(\emptyset), (23)

we get f(S1)f(O)f(S_{1})\geq f(O), which proves that S1S_{1} is an optimal solution when S2S_{2} is empty. \square

A.4 Proof of Lemma 3

For clarity, we decompose Lemma 3 into three lemmas (Lemmas 7-9) and prove each of them.

Lemma 7

For the TwinGreedyFast algorithm, we have

f(O1+S2)eO1+δ(π1(e));f(O2+S1)eO2+δ(π2(e))\displaystyle f(O_{1}^{+}\mid S_{2})\leq\sum_{e\in O_{1}^{+}}\delta(\pi_{1}(e));~{}~{}f(O_{2}^{+}\mid S_{1})\leq\sum_{e\in O_{2}^{+}}\delta(\pi_{2}(e)) (24)

Proof of Lemma 7:  The proof is similar to that of Lemma 4, and we present the full proof for completeness. We will only prove the first inequality, as the second one can be proved in the same way. For any eO1+e\in O_{1}^{+}, consider the moment that TwinGreedy inserts ee into S1S_{1} and suppose that the current threshold is τ\tau. Therefore, we must have δ(e)τ\delta(e)\geq\tau. At that moment, adding ee into S2S_{2} also does not violate the feasibility of \mathcal{I} according to the definition of O1+O_{1}^{+}. So we must have f(ePre(e,S2))δ(e)f(e\mid\mathrm{Pre}(e,S_{2}))\leq\delta(e), because otherwise we have f(ePre(e,S2))>δ(e)τf(e\mid\mathrm{Pre}(e,S_{2}))>\delta(e)\geq\tau, and hence ee would be inserted into S2S_{2} according to the greedy rule of TwinGreedyFast. Using submodularity and the fact that π1(e)=e\pi_{1}(e)=e, we get

f(eS2)f(ePre(e,S2))δ(e)=δ(π1(e)),eO1+\displaystyle f(e\mid S_{2})\leq f(e\mid\mathrm{Pre}(e,S_{2}))\leq\delta(e)=\delta(\pi_{1}(e)),~{}~{}~{}\forall e\in O_{1}^{+} (25)

and hence

eO1+f(O1+S2)eO1+f(eS2)eO1+δ(π1(e)),\displaystyle\sum_{e\in O_{1}^{+}}f(O_{1}^{+}\mid S_{2})\leq\sum_{e\in O_{1}^{+}}f(e\mid S_{2})\leq\sum_{e\in O_{1}^{+}}\delta(\pi_{1}(e)), (26)

which completes the proof. \square

Lemma 8

For the TwinGreedyFast Algorithm, we have

f(O1S2)(1+ϵ)eO1δ(π2(e));f(O2S1)(1+ϵ)eO2δ(π1(e))\displaystyle f(O_{1}^{-}\mid S_{2})\leq(1+\epsilon)\sum_{e\in O_{1}^{-}}\delta(\pi_{2}(e));~{}~{}f(O_{2}^{-}\mid S_{1})\leq(1+\epsilon)\sum_{e\in O_{2}^{-}}\delta(\pi_{1}(e)) (27)

Proof of Lemma 8:  We only prove the first inequality, as the second one can be proved in the same way. For any eO1e\in O_{1}^{-}, consider the moment that TwinGreedyFast adds π2(e)\pi_{2}(e) into S2S_{2}. Using the same reasoning with that in Lemma 5, we can prove: (1) ee has not been inserted into S1S_{1} at the moment that π2(e)\pi_{2}(e) is inserted into S2S_{2}; (2) Pre(π2(e),S2){e}\mathrm{Pre}(\pi_{2}(e),S_{2})\cup\{e\}\in\mathcal{I} (due to Lemma 1).

Let τ\tau be the threshold set by the algorithm when π2(e)\pi_{2}(e) is inserted into S2S_{2}. So we must have δ(π2(e))τ\delta(\pi_{2}(e))\geq\tau. Moreover, we must have f(ePre(π2(e),S2))(1+ϵ)τf(e\mid\mathrm{Pre}(\pi_{2}(e),S_{2}))\leq{(1+\epsilon)\tau}. To see this, let us assume f(ePre(π2(e),S2))>(1+ϵ)τf(e\mid\mathrm{Pre}(\pi_{2}(e),S_{2}))>{(1+\epsilon)\tau} by way of contradiction. If τ=τmax\tau=\tau_{max}, then we get f(e)f(ePre(π2(e),S2))>(1+ϵ)τmaxf(e)\geq f(e\mid\mathrm{Pre}(\pi_{2}(e),S_{2}))>(1+\epsilon)\tau_{max}, which contradicts f(e)τmaxf(e)\leq\tau_{max}. If τ<τmax\tau<\tau_{max}, then consider the moment that ee is checked by the TwinGreedyFast algorithm when the threshold is τ=(1+ϵ)τ\tau^{\prime}=(1+\epsilon)\tau. Let S2,τS_{2,\tau^{\prime}} be the set of elements in S2S_{2} at that moment. Then we have f(eS2,τ)τf(e\mid S_{2,\tau^{\prime}})\geq\tau^{\prime} due to S2,τPre(π2(e),S2)S_{2,\tau^{\prime}}\subseteq\mathrm{Pre}(\pi_{2}(e),S_{2}) and submodularity of f()f(\cdot). Moreover, we must have S2,τ{e}S_{2,\tau^{\prime}}\cup\{e\}\in\mathcal{I} due to Pre(π2(e),S2){e}\mathrm{Pre}(\pi_{2}(e),S_{2})\cup\{e\}\in\mathcal{I} and the hereditary property of independence systems. Consequently, ee should have be added by the algorithm when the threshold is τ\tau^{\prime}, which contradicts the fact stated above that ee has not been added into S1S_{1} at the moment that π2(e)\pi_{2}(e) is inserted into S2S_{2} (under the threshold τ\tau). According to the above reasoning, we get

f(O1S2)eO1f(eS2)eO1f(ePre(π2(e),S2))(1+ϵ)eO1δ(π2(e))\displaystyle f(O_{1}^{-}\mid S_{2})\leq\sum_{e\in O_{1}^{-}}f(e\mid S_{2})\leq\sum_{e\in O_{1}^{-}}f(e\mid\mathrm{Pre}(\pi_{2}(e),S_{2}))\leq(1+\epsilon)\sum_{e\in O_{1}^{-}}\delta(\pi_{2}(e)) (28)

which completes the proof. \square

Lemma 9

For the TwinGreedyFast algorithm, we have

f(O3S1)(1+ϵ)eO3δ(π1(e));f(O4S2)(1+ϵ)eO4δ(π2(e))\displaystyle f(O_{3}\mid S_{1})\leq(1+\epsilon)\sum_{e\in O_{3}}\delta(\pi_{1}(e));~{}~{}f(O_{4}\mid S_{2})\leq(1+\epsilon)\sum_{e\in O_{4}}\delta(\pi_{2}(e)) (29)

Proof of Lemma 9:  We only prove the first inequality, as the second one can be proved in the same way. Consider any eO3e\in O_{3}. According to Lemma 1, we have Pre(π1(e),S1){e}\mathrm{Pre}(\pi_{1}(e),S_{1})\cup\{e\}\in\mathcal{I}, i.e., ee can be added into S1S_{1} without violating the feasibility of \mathcal{I} when π1(e)\pi_{1}(e) is added into S1S_{1}. By similar reasoning with that in Lemma 8, we can get f(ePre(π1(e),S1))(1+ϵ)δ(π1(e))f(e\mid\mathrm{Pre}(\pi_{1}(e),S_{1}))\leq(1+\epsilon)\delta(\pi_{1}(e)), because otherwise ee must have been added into S1S_{1} in an earlier stage of the TwinGreedyFast algorithm (under a larger threshold) before π1(e)\pi_{1}(e) is added into S1S_{1}, but this contradicts eS1S2e\notin S_{1}\cup S_{2}. Therefore, we get

f(O3S1)eO3f(eS1)eO3f(ePre(π1(e),S1))(1+ϵ)eO3δ(π1(e)),\displaystyle f(O_{3}\mid S_{1})\leq\sum_{e\in O_{3}}f(e\mid S_{1})\leq\sum_{e\in O_{3}}f(e\mid\mathrm{Pre}(\pi_{1}(e),S_{1}))\leq(1+\epsilon)\sum_{e\in O_{3}}\delta(\pi_{1}(e)), (30)

which completes the proof. \square

A.5 Proof of Theorem 2

Proof:  In Theorem 3 of Appendix B, we will prove the performance bounds of TwinGreedyFast under a pp-set system constraint. The proof of Theorem 3 can also be used to prove Theorem 2, simply by setting p=1p=1. \square

Appendix B: Extensions for a pp-Set System Constraint

When the independence system (𝒩,)(\mathcal{N},\mathcal{I}) input to the TwinGreedyFast algorithm is a pp-set system, it returns a solution SS^{*} achieving 12p+2ϵ\frac{1}{2p+2}-\epsilon approximation ratio. To prove this, we can define O1+O_{1}^{+}, O1O_{1}^{-}, O2+O_{2}^{+}, O2O_{2}^{-}, O3O_{3}, O4O_{4}, Pre(e,Si)\mathrm{Pre}(e,S_{i}) and δ(e)\delta(e) in exact same way as Definition 1, and then propose Lemma 10, which relaxes Lemma 1 to allow that the preimage by π1()\pi_{1}(\cdot) or π2()\pi_{2}(\cdot) of any element in S1S2S_{1}\cup S_{2} contains at most pp elements. The proof of Lemma 10 is similar to that of Lemma 1. For the sake of completeness and clarity, We provide the full proof of Lemma 10 in the following:

Lemma 10

There exist a function π1:O1+O1O2O3S1\pi_{1}:O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}\mapsto S_{1} such that:

  1. 1.

    For any eO1+O1O2O3e\in O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}, we have Pre(π1(e),S1){e}\mathrm{Pre}(\pi_{1}(e),S_{1})\cup\{e\}\in\mathcal{I}.

  2. 2.

    For each eO1+O1e\in O_{1}^{+}\cup O_{1}^{-}, we have π1(e)=e\pi_{1}(e)=e.

  3. 3.

    Let π11(y)={eO1+O1O2O3:π1(e)=y}\pi_{1}^{-1}(y)=\{e\in O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}:\pi_{1}(e)=y\} for any yS1y\in S_{1}. Then we have |π11(y)|p|\pi_{1}^{-1}(y)|\leq p for any yS1y\in S_{1}.

Similarly, there exists a function π2:O1O2+O2O4S2\pi_{2}:O_{1}^{-}\cup O_{2}^{+}\cup O_{2}^{-}\cup O_{4}\mapsto S_{2} such that Pre(π2(e),S2){e}\mathrm{Pre}(\pi_{2}(e),S_{2})\cup\{e\}\in\mathcal{I} for each eO1O2+O2O4e\in O_{1}^{-}\cup O_{2}^{+}\cup O_{2}^{-}\cup O_{4} and π2(e)=e\pi_{2}(e)=e for each eO2+O2e\in O_{2}^{+}\cup O_{2}^{-} and |π21(y)|p|\pi_{2}^{-1}(y)|\leq p for each yS2y\in S_{2}.

Proof of Lemma 10:  We only prove the existence of π1()\pi_{1}(\cdot), as the existence of π2()\pi_{2}(\cdot) can be proved in the same way. Suppose that the elements in S1S_{1} are {u1,,us}\{u_{1},\cdots,u_{s}\} (listed according to the order that they are added into S1S_{1}). We use an argument inspired by [15] to construct π1()\pi_{1}(\cdot). Let Ls=O1+O1O2O3L_{s}=O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}. We execute the following iterations from j=sj=s to j=0j=0. At the beginning of the jj-th iteration, we compute a set Aj={xLj\{u1,,uj1}:{u1,,uj1,x}}A_{j}=\{x\in L_{j}\backslash\{u_{1},\cdots,u_{j-1}\}:\{u_{1},\cdots,u_{j-1},x\}\in\mathcal{I}\}. If |Aj|p|A_{j}|\leq p, then we set set Dj=AjD_{j}=A_{j}; if |Aj|>p|A_{j}|>p and ujO1+O1u_{j}\in O_{1}^{+}\cup O_{1}^{-} (so ujAju_{j}\in A_{j}), then we pick a subset DjAjD_{j}\subseteq A_{j} satisfying |Dj|=p|D_{j}|=p and ujDju_{j}\in D_{j}; if |Aj|>p|A_{j}|>p and ujO1+O1u_{j}\notin O_{1}^{+}\cup O_{1}^{-}, then we pick a subset DjAjD_{j}\subseteq A_{j} satisfying |Dj|=p|D_{j}|=p. After that, we set π1(e)=uj\pi_{1}(e)=u_{j} for each eDje\in D_{j} and set Lj1=Lj\DjL_{j-1}=L_{j}\backslash D_{j}, and then enter the (j1)(j-1)-th iteration.

From the above process, it can be easily seen that Condition 1-3 in the lemma are satisfied. So we only need to prove that each eO1+O1O2O3e\in O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3} is mapped to an element in S1S_{1}, which is equivalent to prove L0=L_{0}=\emptyset as each eLs\L0e\in L_{s}\backslash L_{0} is mapped to an element in S1S_{1} according to the above process. In the following, we will prove L0=L_{0}=\emptyset by induction, i.e., proving |Lj|pj|L_{j}|\leq pj for all 0js0\leq j\leq s.

When j=sj=s, consider the set M=S1O2O3M=S_{1}\cup O_{2}^{-}\cup O_{3}. Clearly, each element eO3e\in O_{3} satisfies S1{e}S_{1}\cup\{e\}\notin\mathcal{I} according to the definition of O3O_{3}. Besides, we must have S1{x}S_{1}\cup\{x\}\notin\mathcal{I} for each xO2x\in O_{2}^{-}, because otherwise there exists eO2e\in O_{2}^{-} satisfying S1{e}S_{1}\cup\{e\}\in\mathcal{I}, and hence we get Pre(e,S1){e}\mathrm{Pre}(e,S_{1})\cup\{e\}\in\mathcal{I} due to Pre(e,S1)S1\mathrm{Pre}(e,S_{1})\subseteq S_{1} and the hereditary property of independence systems; contradicting eO2e\in O_{2}^{-}. Therefore, we know that S1S_{1} is a base of MM. As O1+O1O2O3O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}\in\mathcal{I} and O1+O1O2O3MO_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}\subseteq M, we get |Ls|=|O1+O1O2O3|p|S1|=ps|L_{s}|=|O_{1}^{+}\cup O_{1}^{-}\cup O_{2}^{-}\cup O_{3}|\leq p|S_{1}|=ps according to the definition of pp-set system.

Now suppose that |Lj|pj|L_{j}|\leq pj for certain jsj\leq s. If |Aj|>p|A_{j}|>p, then we have |Dj|=p|D_{j}|=p and hence |Lj1|=|Lj|pp(j1)|L_{j-1}|=|L_{j}|-p\leq p(j-1). If |Aj|p|A_{j}|\leq p, then we know that there does not exist xLj1\{u1,,uj1}x\in L_{j-1}\backslash\{u_{1},\cdots,u_{j-1}\} such that {u1,,uj1}{x}\{u_{1},\cdots,u_{j-1}\}\cup\{x\}\in\mathcal{I} due to the above process for constructing π1()\pi_{1}(\cdot). Now consider the set M={u1,,uj1}Lj1M^{\prime}=\{u_{1},\cdots,u_{j-1}\}\cup L_{j-1}, we know that {u1,,uj1}\{u_{1},\cdots,u_{j-1}\} is a base of MM^{\prime} and Lj1L_{j-1}\in\mathcal{I}, which implies |Lj1|p(j1)|L_{j-1}|\leq p(j-1) according to the definition of pp-set system.

The above reasoning proves |Lj|pj|L_{j}|\leq pj for all 0js0\leq j\leq s by induction, so we get L0=L_{0}=\emptyset and hence the lemma follows. \square

With Lemma 10, Lemma 3 still holds under a pp-set system constraint, as the proof of Lemma 3 only uses the hereditary property of independence systems and does not require that the functions π1()\pi_{1}(\cdot) and π2()\pi_{2}(\cdot) are injective. Therefore, we can still use Lemma 3 to prove the performance bounds of TwinGreedyFast under a pp-set system constraint, as shown in Theorem 3. Note that the proof of Theorem 3 can also be used to prove Theorem 2, simply by setting p=1p=1.

Theorem 3

When the independence system (𝒩,)(\mathcal{N},\mathcal{I}) input to TwinGreedyFast is a pp-set system, the TwinGreedyFast algorithm returns a solution SS^{*} with 12p+2ϵ\frac{1}{2p+2}-\epsilon approximation ratio, under time complexity of 𝒪(nϵlogrϵ)\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon}).

Proof of Theorem 3:  We first consider the special case that S1S_{1} or S2S_{2} is empty, and show that TwinGreedyFast achieves 1ϵ1-\epsilon approximation ratio under this case. Without loss of generality, we assume S2S_{2} is empty. By similar reasoning with the proof of Theorem 1 (Appendix A.3), we get f(S1)f(OS1)f(S_{1}\mid\emptyset)\geq f(O\cap S_{1}\mid\emptyset). Besides, for each eO\S1e\in O\backslash S_{1}, we must have f(e)<τminf(e\mid\emptyset)<\tau_{min} (where τmin\tau_{min} is the smallest threshold tested by the algorithm), because otherwise ee should be added into S2S_{2} by the TwinGreedyFast algorithm. By the submodularity of f()f(\cdot), we have

f(O)f()\displaystyle f(O)-f(\emptyset) \displaystyle\leq f(OS1)+f(O\S1)f(S1)+eO\S1f(e)\displaystyle f(O\cap S_{1}\mid\emptyset)+f(O\backslash S_{1}\mid\emptyset)\leq f(S_{1}\mid\emptyset)+\sum_{e\in O\backslash S_{1}}f(e\mid\emptyset)
\displaystyle\leq f(S1)+rτminf(S1)+rϵτmaxrf(S1)+ϵf(O),\displaystyle f(S_{1}\mid\emptyset)+r\cdot\tau_{min}\leq f(S_{1}\mid\emptyset)+r\cdot\frac{\epsilon\cdot\tau_{max}}{r}\leq f(S_{1}\mid\emptyset)+\epsilon f(O),

which proves that S1S_{1} has a 1ϵ1-\epsilon approximation ratio. In the sequel, we consider the case that S1S_{1}\neq\emptyset and S2S_{2}\neq\emptyset. Let O5=O\(S1S2O3)O_{5}=O\backslash(S_{1}\cup S_{2}\cup O_{3}) and O6=O\(S1S2O4)O_{6}=O\backslash(S_{1}\cup S_{2}\cup O_{4}). By submodularity, we have

f(OS1)f(S1)f(O2+S1)+f(O2S1)+f(O3S1)+f(O5S1)\displaystyle f(O\cup S_{1})-f(S_{1})\leq f(O_{2}^{+}\mid S_{1})+f(O_{2}^{-}\mid S_{1})+f(O_{3}\mid S_{1})+f(O_{5}\mid S_{1}) (31)
f(OS2)f(S2)f(O1+S2)+f(O1S2)+f(O4S2)+f(O6S2)\displaystyle f(O\cup S_{2})-f(S_{2})\leq f(O_{1}^{+}\mid S_{2})+f(O_{1}^{-}\mid S_{2})+f(O_{4}\mid S_{2})+f(O_{6}\mid S_{2}) (32)

Using Lemma 3, we get

f(O2+S1)+f(O2S1)+f(O3S1)+f(O1+S2)+f(O1S2)+f(O4S2)\displaystyle f(O_{2}^{+}\mid S_{1})+f(O_{2}^{-}\mid S_{1})+f(O_{3}\mid S_{1})+f(O_{1}^{+}\mid S_{2})+f(O_{1}^{-}\mid S_{2})+f(O_{4}\mid S_{2}) (33)
\displaystyle\leq (1+ϵ)[eO1+O2O3δ(π1(e))+eO1O2+O4δ(π2(e))]\displaystyle(1+\epsilon)\left[\sum_{e\in O_{1}^{+}\cup O_{2}^{-}\cup O_{3}}\delta(\pi_{1}(e))+\sum_{e\in O_{1}^{-}\cup O_{2}^{+}\cup O_{4}}\delta(\pi_{2}(e))\right]
\displaystyle\leq (1+ϵ)[eS1|π11(e)|δ(e)+eS2|π21(e)|δ(e)]\displaystyle(1+\epsilon)\left[\sum_{e\in S_{1}}|\pi_{1}^{-1}(e)|\cdot\delta(e)+\sum_{e\in S_{2}}|\pi_{2}^{-1}(e)|\cdot\delta(e)\right]
\displaystyle\leq (1+ϵ)p[eS1δ(e)+eS2δ(e)]\displaystyle(1+\epsilon)p\left[\sum_{e\in S_{1}}\delta(e)+\sum_{e\in S_{2}}\delta(e)\right]
\displaystyle\leq (1+ϵ)p[f(S1)+f(S2)],\displaystyle(1+\epsilon)p\left[f(S_{1})+f(S_{2})\right],

where the third inequality is due to Lemma 10. Besides, according to the definition of O5O_{5}, we must have f(eS1)<τminf(e\mid S_{1})<\tau_{min} for each eO5e\in O_{5}, where τmin\tau_{min} is the smallest threshold tested by the algorithm, because otherwise ee should be added into S1S_{1} as S1{e}S_{1}\cup\{e\}\in\mathcal{I}. Similarly, we get f(eS2)<τminf(e\mid S_{2})<\tau_{min} for each eO6e\in O_{6}. Therefore, we have

f(O5S1)eO5f(eS1)rτminrϵτmaxrϵf(O)\displaystyle f(O_{5}\mid S_{1})\leq\sum_{e\in O_{5}}f(e\mid S_{1})\leq r\cdot\tau_{min}\leq r\cdot\frac{\epsilon\cdot\tau_{max}}{r}\leq\epsilon\cdot f(O) (34)
f(O6S2)eO6f(eS2)rτminrϵτmaxrϵf(O)\displaystyle f(O_{6}\mid S_{2})\leq\sum_{e\in O_{6}}f(e\mid S_{2})\leq r\cdot\tau_{min}\leq r\cdot\frac{\epsilon\cdot\tau_{max}}{r}\leq\epsilon\cdot f(O) (35)

As f()f(\cdot) is a non-negative submodular function and S1S2=S_{1}\cap S_{2}=\emptyset, we have

f(O)f(O)+f(OS1S2)f(OS1)+f(OS2)\displaystyle f(O)\leq f(O)+f(O\cup S_{1}\cup S_{2})\leq f(O\cup S_{1})+f(O\cup S_{2}) (36)

By summing up Eqn. (31)-(36) and simplifying, we get

f(O)\displaystyle f(O) \displaystyle\leq [1+(1+ϵ)p][f(S1)+f(S2)]+2ϵf(O)\displaystyle[1+(1+\epsilon)p][f(S_{1})+f(S_{2})]+2\epsilon\cdot f(O)
\displaystyle\leq (2p+2+2pϵ)f(S)+2ϵf(O)\displaystyle(2p+2+2p\epsilon)f(S^{*})+2\epsilon\cdot f(O)

So we have f(S)12ϵ2p+2+2pϵf(O)(12p+2ϵ)f(O)f(S^{*})\geq\frac{1-2\epsilon}{2p+2+2p\epsilon}f(O)\geq(\frac{1}{2p+2}-\epsilon)f(O). Note that the TwinGreedyFast algorithm has at most 𝒪(log1+ϵrϵ)\mathcal{O}(\log_{1+\epsilon}\frac{r}{\epsilon}) iterations with 𝒪(n)\mathcal{O}(n) time complexity in each iteration. Therefore, the total time complexity is 𝒪(nϵlogrϵ)\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon}), which completes the proof. \square

Appendix C: Supplementary Materials on Experiments

C.1 Social Network Monitoring

It can be easily verified that the social network monitoring problem considered in Section 5 is a non-monotone submodular maximization problem subject to a partition matroid constraint. We provide additional experimental results on Barabasi-Albert (BA) random graphs, as shown in Fig. 3. In Fig. 3, we generate a BA graph with 10,000 nodes and m0=m=100m_{0}=m=100, and set h=5h=5 for Fig. 3(a)-(b) and set h=10h=10 for Fig. 3(c)-(d), respectively. The other settings in Fig. 3 are the same with those for ER random graph in Section 5. It can be seen that the experimental results in Fig. 3 are qualitatively similar to those on the ER random graph, and TwinGreedyFast still runs more than an order of magnitude faster than the other three algorithms. Besides, it is observed from Fig. 3 that TwinGreedyFast and TwinGreedy perform closely to Fantom and slightly outperform RRG and SampleGreedy on utility, while it is also possible that TwinGreedyFast/TwinGreedy can outperform Fantom on utility in some cases.

Refer to caption
Figure 3: Experimental results for social network monitoring on Barabasi-Albert (BA) random graph

In Table 2, we study how the utility of TwinGreedyFast can be affected by the parameter ϵ\epsilon. The experimental results in Table 2 reveal that, the utility of TwinGreedyFast slightly increases when ϵ\epsilon decreases, and almost does not change when ϵ\epsilon is sufficiently small (e.g., ϵ0.02\epsilon\leq 0.02). Therefore, we would not suffer a great loss on utility by setting ϵ\epsilon to a relatively large number in (0,1)(0,1) for TwinGreedyFast.

Table 2: The utility of TwinGreedyFast (×105\times 10^{5}) vs. the parameter ϵ\epsilon (BA, h=5h=5)
ϵ\epsilon k=k=50 100 150 200 250 300 350 400 450 500
0.2 0.145 0.281 0.410 0.529 0.637 0.744 0.836 0.925 1.004 1.078
0.15 0.149 0.289 0.415 0.535 0.643 0.747 0.841 0.929 1.008 1.082
0.1 0.154 0.291 0.418 0.538 0.648 0.751 0.846 0.933 1.013 1.085
0.05 0.154 0.293 0.421 0.540 0.651 0.753 0.848 0.935 1.015 1.087
0.02 0.155 0.294 0.422 0.541 0.652 0.754 0.849 0.936 1.015 1.087
0.01 0.155 0.294 0.422 0.541 0.652 0.754 0.849 0.936 1.015 1.087
0.005 0.155 0.294 0.422 0.541 0.652 0.754 0.849 0.936 1.015 1.088

C.2 Multi-Product Viral Marketing

We first prove that the multi-product viral marketing application considered in Section 5 is an instance of the problem of non-monotone submodular maximization subject to a matroid constraint. Recall that we need to select kk seed nodes from a social network G=(V,E)G=(V,E) to promote mm products, and each node uVu\in V can be selected as a seed for at most one product. These requirements can be modeled as a matroid constraint, as proved in the following lemma:

Lemma 11

Define the ground set 𝒩=V×[m]\mathcal{N}=V\times[m] and ={X𝒩:|X|kuV:|X𝒩u|1}\mathcal{I}=\{X\subseteq\mathcal{N}:|X|\leq k\wedge\forall u\in V:|X\cap\mathcal{N}_{u}|\leq 1\}, where 𝒩u{(u,i):i[m]}\mathcal{N}_{u}\triangleq\{(u,i):i\in[m]\} for any uVu\in V. Then (𝒩,)(\mathcal{N},\mathcal{I}) is a matroid.

Proof of Lemma 11:  It is evident that (𝒩,)(\mathcal{N},\mathcal{I}) is an independence system. Next, we prove that it satisfies the exchange property. For any XX\in\mathcal{I} and YY\in\mathcal{I} satisfying |X|<|Y||X|<|Y|, there must exist certain vVv\in V such that |Y𝒩v|>|X𝒩v||Y\cap\mathcal{N}_{v}|>|X\cap\mathcal{N}_{v}| (i.e., |Y𝒩v|=1|Y\cap\mathcal{N}_{v}|=1 and |X𝒩v|=0|X\cap\mathcal{N}_{v}|=0), because otherwise we have |X|=uV|X𝒩u|uV|Y𝒩u|=|Y||X|=\sum_{u\in V}|X\cap\mathcal{N}_{u}|\geq\sum_{u\in V}|Y\cap\mathcal{N}_{u}|=|Y|; contradicting |X|<|Y||X|<|Y|. As |X|<|Y|k|X|<|Y|\leq k, we can add the element in Y𝒩vY\cap\mathcal{N}_{v} into XX without violating the feasibility of \mathcal{I}, which proves that (𝒩,)(\mathcal{N},\mathcal{I}) satisfies the exchange property of matroids. \square

Next, we prove that the objective function in multi-product viral marketing is a submodular function defined on 2𝒩2^{\mathcal{N}}:

Lemma 12

For any S𝒩S\subseteq\mathcal{N} and SS\neq\emptyset, define

f(S)=i[m]fi(Si)+(Bi[m]vSic(v))\displaystyle f(S)=\sum_{i\in[m]}f_{i}(S_{i})+\left(B-\sum_{i\in[m]}\sum_{v\in S_{i}}c(v)\right) (37)

where Si{u(u,i)S}S_{i}\triangleq\{u\mid(u,i)\in S\} and fi()f_{i}(\cdot) is a non-negative submodular function defined on 2V2^{V} (i.e., an influence spread function). We also define f()=0f(\emptyset)=0. Then f()f(\cdot) is a submodular function defined on 2𝒩2^{\mathcal{N}}.

Proof of Lemma 12:  For any ST𝒩S\subsetneqq T\subseteq\mathcal{N} and any x=(u,i)𝒩\Tx=(u,i)\in\mathcal{N}\backslash T, we must have uSiu\notin S_{i} and uTiu\notin T_{i}. So we get f(xT)=fi(uTi)c(u)f(x\mid T)=f_{i}(u\mid T_{i})-c(u). If SS\neq\emptyset, then we have f(xS)=fi(uSi)c(u)f(x\mid S)=f_{i}(u\mid S_{i})-c(u) and hence f(xT)f(xS)f(x\mid T)\leq f(x\mid S) due to SiTiS_{i}\subseteq T_{i} and the submodularity of fi()f_{i}(\cdot). If S=S=\emptyset, then we also have f(xS)=fi(u)+Bc(u)fi(uTi)c(u)=f(xT)f(x\mid S)=f_{i}(u)+B-c(u)\geq f_{i}(u\mid T_{i})-c(u)=f(x\mid T), which completes the proof. \square

As we set B=muVc(u)B=m\sum_{u\in V}c(u), the objective function f()f(\cdot) is also non-negative. Note that fi(A)f_{i}(A) denotes the total expected number of nodes in VV that can be activated by A(AV)A~{}(\forall A\subseteq V) under the celebrated Independent Cascade (IC) Model [34]. As evaluating fi(A)f_{i}(A) for any given AVA\subseteq V under the IC model is an NP-hard problem, we use the estimation method proposed in [7] to estimate fi(A)f_{i}(A), based on the concept of “Reverse Reachable Set” (RR-set). For completeness, we introduce this estimation method in the following:

Given a directed social network G=(V,E)G=(V,E) with each edge (u,v)(u,v) associated with a probability pu,vp_{u,v}, a random RR-set RR under the IC model is generated by: (1) remove each edge (u,v)E(u,v)\in E independently with probability 1pu,v1-p_{u,v} and reverse (u,v)(u,v)’s direction if it is not removed; (2) sample vVv\in V uniformly at random and set RR as the set of nodes reachable from vv in the graph generated by the first step. Given a set ZZ of random RR-sets, any i[m]i\in[m] and any AVA\subseteq V, we define

f^i(A)=RZ|V|min{1,|AR|}/|Z|\displaystyle\hat{f}_{i}(A)=\sum\nolimits_{R\in Z}|V|\cdot\min\{1,|A\cap R|\}/|Z| (38)

According to [7], f^i(A)\hat{f}_{i}(A) is an unbiased estimation of fi(A)f_{i}(A), and f^i()\hat{f}_{i}(\cdot) is also a non-negative monotone submodular function defined on 2V2^{V}. Therefore, in our experiments, we generate a set ZZ of one million random RR-sets and use f^i()\hat{f}_{i}() to replace fi(){f}_{i}() in the objective function shown in Eqn. (37), which keeps f()f(\cdot) as a non-negative submodular function defined on 2𝒩2^{\mathcal{N}}.