This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Influence Maximization over Markovian Graphs: A Stochastic Optimization Approach

Buddhika Nettasinghe,  and Vikram Krishnamurthy Authors are with the Department of Electrical and Computer Engineering, Cornell University and, Cornell Tech, 2 West Loop Rd, New York, NY 10044, USA e-mail: {dwn26, vikramk}@cornell.edu.
Abstract

This paper considers the problem of randomized influence maximization over a Markovian graph process: given a fixed set of nodes whose connectivity graph is evolving as a Markov chain, estimate the probability distribution (over this fixed set of nodes) that samples a node which will initiate the largest information cascade (in expectation). Further, it is assumed that the sampling process affects the evolution of the graph i.e. the sampling distribution and the transition probability matrix are functionally dependent. In this setup, recursive stochastic optimization algorithms are presented to estimate the optimal sampling distribution for two cases: 1) transition probabilities of the graph are unknown but, the graph can be observed perfectly 2) transition probabilities of the graph are known but, the graph is observed in noise. These algorithms consist of a neighborhood size estimation algorithm combined with a variance reduction method, a Bayesian filter and a stochastic gradient algorithm. Convergence of the algorithms are established theoretically and, numerical results are provided to illustrate how the algorithms work.

Index Terms:
Influence maximization, stochastic optimization, Markovian graphs, independent cascade model, variance reduction, Bayesian filter.

I Introduction

Influence maximization refers to the problem of identifying the most influential node (or the set of nodes) in a network, which was first studied in the seminal paper [1]. However, most work related to influence maximization so far has been limited by one or more of the following assumptions:

  1. 1.

    Deterministic network (with no random evolution)

  2. 2.

    Fully observed graph (instead of noisy observations of the graph)

  3. 3.

    Passive nodes (as opposed to active nodes that are responsive to the influence maximization process).

This paper attempts to relax the above three assumptions. We develop stochastic optimization algorithms for influence maximization over a randomly evolving, partially observed network of active nodes (see Fig. 1 for a schematic overview of our approach).

Refer to caption
Figure 1: Schematic diagram of the proposed stochastic optimization algorithm for influence maximization over a partially observed dynamic network of active nodes, showing how the algorithmic sub-components are integrated to form the closed loop (with feedback from the sampling process) stochastic optimization algorithm and their organization in the paper.

To understand the motivation behind this problem, consider a social network graph where nodes represent individuals and the directed edges represent connectivity between them. Assume that this graph evolves in a Markovian manner111Dynamics of social networks (such as seasonal variations in friendship networks) can naturally be modeled as Markov processes. Another example would be a vehicular network where, the inter-vehicle communication/connectivity graph has a Markovian evolution due to their movements. Refer [2] for an example in the context of social networks. with time. Further, each individual can pass/receive messages (also called infections depending on the context) from their neighbors by communicating over the directed edges of the graph. Communication over these edges will incur time delays that are independently and identically distributed (across the edges) according to a known distribution. An influence maximizer wants to periodically convey messages (e.g. viral marketing) that expire in a certain time window to the nodes in this evolving social network. Conveying each message to the nodes in the social network is achieved by sampling a node (called seed node) from the graph according to a probability distribution and, giving the message to that seed node. Then, the seed node initiates an information cascade by transmitting the message to its neighbors with random delays. The nodes that receive the message from their neighbors will continue to follow the same steps, until the message expires. It is assumed that the graph remains same throughout the diffusion of one message i.e. graph evolves on a slower time scale compared to the expiration time of a message. Further, we allow the nodes of the social network to be active: nodes are aware of the sampling distribution and, respond by modifying the transition probabilities of the graph according to that distribution (for example, due to the incentive that they receive for being the most influential node222Another example for active nodes is a network of computers that are adaptively modifying their connectivity network, depending on how vulnerable each computer is to a virus attack.). This makes the transition probability matrix functionally dependent on the sampling distribution. In this setting, the goal of the influence maximizer is to compute the sampling distribution which maximizes the expected total number of nodes that are infected (size of the information cascade) before a message expires (considering the randomness of sampling the nodes, message diffusion process as well as the graph evolution). This motivates us to pursue the aim of this paper, which is to devise a method for the influence maximizer to estimate the optimal sampling distribution recursively, with each message that is distributed.

The main results of this paper are two recursive stochastic gradient algorithms, for the influence maximizer to recursively estimate (track) the optimal sampling distribution for the following cases:

  1. 1.

    influence maximizer does not know the transition probability matrix but, has perfect (non-noisy) observations of the sample path of the graph.

  2. 2.

    influence maximizer knows the transition probability matrix but, has only partial (noisy) observations of the sample path of the graph evolution.

The key components of the above two algorithms (illustrated in Fig. 1) include the following.

  • Reduced variance neighborhood size estimation algorithm: Influence maximization problems involve estimating the influence of nodes, which can be posed as a problem of estimating the (expected) sizes of node neighborhoods. For this, we use a stochastic simulation based neighborhood size estimation algorithm (which utilizes a modified Dijkstra’s algorithm, combined with an exponential random variable assignment process), coupled with a variance reduction approach. It is shown that this reduced variance method improves the convergence of the proposed algorithms when tracking the optimal influence in a time evolving system.

  • Stochastic optimization with delayed observations of the graph process: The observations of the graph sample path in the two algorithms (in main contributions) are not assumed to be available in real time. Instead, it is sufficient if the sample paths of finite lengths become available as batches of data with some time delay333In most real world applications, one can only trace back the evolution of a social network over a period of time (length of the finite sample path), instead of monitoring it in real time. e.g. how the graph has evolved over the month of January becomes available to the influence maximizer only at the end of February due to delays in obtaining data..These finite length graph sample paths are used in a stochastic optimization method that is based on the simultaneous perturbation stochastic approximation (SPSA) method, coupled with a finite sample path gradient estimation method for Markov processes. The proposed algorithms are applicable even for the more general case where, the system model (state space of the Markovian graph process, the functional dependency between the sampling distribution and transition probabilities, etc) is varying on a slower (compared to the stochastic optimization algorithm) time scale.

  • Bayesian filter for noisy graph sample paths: In the algorithm for the second case (in main contributions), the sample paths are assumed to be observed in noise. In this case, a Bayesian filter is utilized to estimate the underlying state of the graph using the noisy sample path as the input. The estimates computed by the Bayesian filter are then utilized in the stochastic optimization process while preserving the (weak) convergence of the stochastic approximation algorithm.

Related Work: The influence maximization problem was first posed in [1] as a combinatorial optimization problem for two widely accepted models of information spreading in social networks: independent cascade model and the linear threshold model. [1] shows that solving this problem is NP-hard for both of these models and, utilizes a greedy (submodular function) maximization approach to devise algorithms with a 11e1-\frac{1}{e} approximation guarantee. Since then, this problem and its variants have been widely studied using different techniques and models. [3, 4, 5, 6] studies the problem in a competitive/adversarial settings with multiple influence maximizers and provide equilibrium results and approximation algorithms. [7, 8] considers the case where all the nodes of the graph are not initially accessible to the influence maximizer and, proposes a multistage stochastic optimization method that harvests the power of a phenomenon called friendship paradox [9, 10]. Further, [11, 12] provide heuristic algorithms for the influence maximization problem on the independent cascade model, which are more efficient compared to the originally proposed algorithms in [1]. Our work is motivated by the problems studied in [13, 14]. [13] points out that estimating the expected cascade size under the independent cascade model can be posed as a neighborhood size estimation problem on a graph and, utilizes a size estimation framework proposed in [15] (and used previously in [16]) to obtain an unbiased estimate of this quantity. [14] highlights that the study of the influence maximization problem has mostly been limited to the context of static graphs and, proposes a random probing method for the case where the graph may evolve randomly. Motivated by these observations, we focus on a continuous time variant [13, 17] of the independent cascade model and, allow the underlying social network graph to evolve slowly as a Markov process. The solution approaches proposed in this paper belongs to the class of recursive stochastic approximation methods. These methods have been utilized previously to solve many problems in the field of multi-agent networks [18, 19, 20].

Organization: Sec. II presents the network model, related definitions of influence on graphs, the definition of the main problem and finally, a discussion of some practical details. Sec. III presents the recursive stochastic optimization algorithm (along with convergence theorems) to solve the main problem for the case of fully observed graph with unknown transition probabilities (case 1 of the main contributions). Sec. IV extends to the case of partially observed graph with known transition probabilities (case 2 of the main contributions). Sec. V provides numerical results to illustrate the algorithms presented.

II Diffusion Model and the Problem of Randomized Influence Maximization

This section describes Markovian graph process, how information spreads in the graph (information diffusion model) and, provides the definition of the main problem. Further, motivation for the problem, supported by work in recent literature, is provided to highlight some practical details.

II-A Markovian Graph Process, Frequency of the Messages and the Independent Cascade (IC) Diffusion Model

Markovian Graph Process: The social network graph at discrete time instants n=0,1,n=0,1,\dots is modeled as a directed graph Gn=(V,En)G_{n}=(V,E_{n}), consisting of a fixed set of individuals VV, connected by the set of directed edges EnE_{n}. The graph evolves as a Markov process {Gn=(V,En)}n0\{G_{n}=(V,E_{n})\}_{n\geq 0} with a finite state space 𝒢\mathcal{G}, and a parameterized regular transition matrix Pθ{P_{\theta}} with a unique stationary distribution πθ{\pi_{\theta}} (where, θ\theta denotes the parameter vector which lies in a compact subset of an MM-dimensional Euclidean space). Henceforth, nn is used to denote the discrete time scale on which the Markovian graph process evolves.

Frequency of the Messages: An influence maximizer distributes messages to the nodes in this evolving network. We assume that the messages are distributed periodically444The assumption of periodic messages is not required for the problem considered in this paper and the proposed algorithms. It suffices if the messages are distributed with some minimum time gap between them (on the time scale nn). at time instants n=kNn=kN where, the positive integer NN denotes the period.

Observations of the Finite Sample Paths: When the influence maximizer distributes the (k+1)(k+1)th message at time n=(k+1)N{n=(k+1)N}, only the finite sample path of the Markovian graph process {Gn}kNn<kN+N¯\{G_{n}\}_{kN\leq n<kN+\bar{N}} for some fixed N¯\bar{N}\in\mathbb{N} with N¯<N\bar{N}<N, is visible to the influence maximizer.

Information Diffusion model: As explained in the example in Sec. I, nodes pass messages they receive to their neighbors with random delays. This method of information spreading in graphs is formalized by independent cascade (IC) model of information diffusion. Various versions of this IC model have been studied in literature. We use a slightly different version of the IC model utilized in [13, 17] and, it is as follows briefly. When the influence maximizer gives a message to a set of seed nodes555We consider the case where the influence maximizer selects a set of seed nodes instead of one seed node (as in the motivating example in Sec. I) to keep the definitions of this section more general. nV{\mathcal{I}_{n}}\subseteq V at time nn, a time variable t0t\in\mathbb{R}_{\geq 0} is initialized at t=0t=0. Here, tt is the continuous time scale on which the diffusion of the message takes place and, is different from the discrete time scale n0n\in\mathbb{Z}_{\geq 0} on which the graph evolves (and the messages are distributed periodically). Further, the time scale tt is nested in the time scale nn: tt is set to t=0t=0 with each new message distributed by the influence maximizer. The set of seed nodes n{\mathcal{I}_{n}} (that receive the message from influence maximizer at nn and t=0t=0) transmits the message through edges attached to them in graph Gn=(V,En)G_{n}=(V,E_{n}). Each edge (j,i)En(j,i)\in E_{n} induces a random delay distributed according to probability density function πtt(τji){\pi_{tt}}(\tau_{ji}) (called transmission time distribution). Further, each edge transmits the message only once. Only the neighbor which infects a node first will be considered as the true parent node (considering the infection propagation) of the infected node. This process continues until the message expires at t=Tt=T (henceforth referred to as the message expiration time) and, the diffusion process stops at that time. It is assumed that the discrete time scale nn on which the graph evolves is slower than TT i.e. the graph will remain the same at least for t[0,T]t\in[0,T]. Then, the same message spreading process will take place when the next message is distributed.

For any realization of this random message spreading process (taking place in t[0,T]t\in[0,T]), the subgraph of GnG_{n} which is induced by the set of edges through which the message propagated, constitutes a Directed Acyclic Graph (DAG). Further, due to the DAG induced by the propagation of a message, the infection time tit_{i} of each node iVi\in V satisfies the shortest path property: conditional on a graph Gn=(V,En)G_{n}=(V,E_{n}), and a set of pairwise transmission times {τji}(ji)En\{\tau_{ji}\}_{(ji)\in E_{n}}, the infection time tit_{i} of iVi\in V is given by,

ti=gi({τji}(ji)En|)t_{i}=g_{i}(\{\tau_{ji}\}_{(ji)\in E_{n}}|\,{\mathcal{I}}) (1)

where, gi({τji}(ji)En|)g_{i}(\{\tau_{ji}\}_{(ji)\in E_{n}}|\,{\mathcal{I}}) denotes the length of the shortest path from V{\mathcal{I}}\subseteq V to iVi\in V, with edge lengths {τji}(ji)En\{\tau_{ji}\}_{(ji)\in E_{n}}.

II-B Influence of a Set of Nodes Conditional on a Graph

We use the following definition (from [13, 21]) of influence of a set of nodes V{\mathcal{I}}\subseteq V, on a graph G=(V,E)G=(V,E), for the diffusion model introduced in Sec. II-A. Further, 𝔼πtt{}\mathbb{E}_{{\pi_{tt}}}\{\cdot\} is used to denote expectation over a set of pairwise transmission times (associated with each edge) sampled independently from πtt{\pi_{tt}}.

Definition 1.

The Influence, σG(,T)\sigma_{G}({\mathcal{I}},T), of a set of nodes {\mathcal{I}}, given a graph G=(V,E){G=(V,E)} and a fixed time window TT is,

σG(,T)\displaystyle\sigma_{G}({\mathcal{I}},T) =𝔼πtt{iV𝟙{tiT}|,G}\displaystyle=\mathbb{E}_{{\pi_{tt}}}\Big{\{}\sum_{i\in V}\mathds{1}_{\{t_{i}\leq T\}}\bigr{|}{\mathcal{I}},G\Big{\}} (2)
=iVπtt{tiT|,G}.\displaystyle=\sum_{i\in V}\mathbb{P}_{{\pi_{tt}}}\big{\{}{t_{i}\leq T}\bigr{|}{\mathcal{I}},G\big{\}}. (3)

In (2), the influence σG(,T)\sigma_{G}({\mathcal{I}},T) of the set of nodes V{{\mathcal{I}}\subset V} is the expected number of nodes infected within time TT, by the diffusion process (characterized by the distribution of transmission delays πtt{\pi_{tt}}) on the graph GG, that started with the set of seed nodes {\mathcal{I}}.

Note that the set of infection times {ti}iV\{t_{i}\}_{i\in V}, in (2) are dependent random variables. Therefore, obtaining closed form expressions for marginal cumulative distributions in (3) involves computing |V|1|V|-1 dimensional integral which is not possible (in closed form) for many general forms of the transmission delay distribution. Further, numerical evaluation of these marginal distributions is also not feasible since it will involve discretizing the domain [0,)[0,\infty) (refer [13] for a more detailed description about the computational infeasibility of the calculation of the expected value in (4)).

Why randomized selection of seed nodes? Assume that, at time instant n=kNn=kN, an influence maximizer needs to find a seed node iVi^{*}\in V that maximizes σGn({i},T)\sigma_{G_{n}}(\{i\},T) (in order to distribute the kkth message, as explained in Sec. II-A). To achieve this, the influence maximizer needs to know the graph GnG_{n}, calculate the influence σGn({i},T)\sigma_{G_{n}}(\{i\},T) for each iVi\in V and then, locate the iVi^{*}\in V that has the largest influence. However, performing all these steps for each message is not feasible from a practical perspective. Especially, monitoring the graph in real time is practically difficult. Hence, a natural alternative is to use a randomized seed selection method. We consider a case where the seed node iVi\in V is sampled from a parameterized probability distribution pθΔ(V)p_{\theta}\in\Delta(V) such that, the expected size of the cascade initiated by the sampled node ii is largest when θ=θ\theta=\theta^{*}. This random seed selection approach leads to the formal definition of the main problem which we call the randomized influence maximization.

II-C Randomized Influence Maximization over a Markovian Graph Process: Problem Definition

We define the influence of a parameterized probability distribution pθ,θM{p_{\theta}},\,\theta\in\mathbb{R}^{M}, on a Markovian graph process as below (henceforth, we will use σG(i,T)\sigma_{G}(i,T) to denote σG({i},T)\sigma_{G}(\{i\},T) with a slight notational misuse).

Definition 2.

The Influence, C(θ)C(\theta), of probability distribution pθΔ(V){p_{\theta}}\in\Delta(V), on a Markovian graph process {Gn=(V,En)}n0{\{G_{n}=(V,E_{n})\}_{n\geq 0}} with a finite state space 𝒢\mathcal{G}, a regular transition matrix Pθ{P_{\theta}} and, a unique stationary distribution πθΔ(𝒢){\pi_{\theta}}\in\Delta(\mathcal{G}) is,

C(θ)=𝔼Gπθ{c(θ,G)}C(\theta)=\mathbb{E}_{G\sim{\pi_{\theta}}}\{c(\theta,G)\} (4)

where,

c(θ,G)=𝔼ipθ{σG(i,T)}.c(\theta,G)=\mathbb{E}_{i\sim{p_{\theta}}}\{\sigma_{G}(i,T)\}. (5)

Eq. (5) averages influence σG(i,T)\sigma_{G}(i,T) using the sampling distribution pθ{p_{\theta}}, to obtain c(θ,G)c(\theta,G), which is the influence of the sampling distribution pθ{p_{\theta}} conditional on the graph GG. Then, (4) averages c(θ,G)c(\theta,G) using the unique stationary distribution πθ{\pi_{\theta}} of the graph process in order to obtain C(θ)C(\theta), which is the influence of the sampling distribution pθ{p_{\theta}} over the Markovian graph process. We will refer to c(θ,Gn)c(\theta,G_{n}) as the conditional (on graph GnG_{n}) influence function (at time nn) and, C(θ)C(\theta) as the influence function, of the sampling distribution pθ{p_{\theta}}.

Remark 1.

The sampling distribution pθ{p_{\theta}} is treated as a function of θ\theta, which also parameterizes the transition matrix Pθ{P_{\theta}} of the graph process. This functional dependency models the feedback (via the active nodes) from the sampling of nodes (by the influence maximizer) to the evolution of the graph, as indicated by the feedback loop in Fig. 1 (and also discussed in an example setting in Sec. I).

In this context, the main problem studied in this paper can be defined as follows.

Problem Definition.

Randomized influence maximization over a Markovian graph process {Gn=(V,En)}n0{\{G_{n}=(V,E_{n})\}_{n\geq 0}} with a finite state space 𝒢\mathcal{G}, a regular transition matrix PθkP_{\theta_{k^{\prime}}} and, a unique stationary distribution πθk\pi_{\theta_{k^{\prime}}} aims to recursively estimate the time evolving optima,

θkargmaxθMCk(θ)\theta^{*}_{k^{\prime}}\in\operatorname*{arg\,max}_{\theta\in\mathbb{R}^{M}}\,C_{k^{\prime}}(\theta) (6)

where, Ck(θ)C_{k^{\prime}}(\theta) is the influence function (Definition 2) that is evolving on the slower time scale k0k^{\prime}\in\mathbb{Z}_{\geq 0} (compared to the message period NN over the time scale nn) .

Remark 2.

The reason for allowing the influence function Ck(θ)C_{k^{\prime}}(\theta) (and therefore, the solution θk\theta^{*}_{k^{\prime}}) in (6) to evolve over the slow time scale kk^{\prime} is because the functional dependency of the sampling distribution and the transition probability matrix (which gives how the graph evolution depends on sampling process) may change over time. Further, the state space 𝒢\mathcal{G} of the graph process may also evolve over time. Such changes (with time) in the system model are encapsulated by modeling the influence function as a time evolving quantity. However, to keep the notation manageable, we assume that the influence function Ck(θ)C_{k^{\prime}}(\theta) does not evolve over time in the subsequent sections i.e. it is assumed that

Ck(θ)=C(θ),k0.C_{k^{\prime}}(\theta)=C(\theta),\forall\,k^{\prime}\in\mathbb{Z}_{\geq 0}. (7)

This assumption is used to keep the notation manageable and, can be removed without affecting the main algorithms presented in this paper. Further, we assume that the C(θ)C(\theta) has a Lipschitz continuous derivative.

II-D Discussion about Key Aspects of the System Model

Networks as Markovian Graphs: We assumed that the graph evolution is Markovian. In a similar context to ours, [22] states that “Markovian evolving graphs are a natural and very general class of models for evolving graphs” and, studies the information spreading protocols on them during the stationary phase. Further, [23] considers information broadcasting methods on Markovian graph processes since they are “general enough for allowing us to model basically any kind of network evolution”.

Functional Dependency of the Sampling Process and Graph Evolution: [24] considers a weakly adversarial random broadcasting network: a randomly evolving broadcast network whose state at the next time instant is sampled from a distribution that minimizes the probability of successful communications. Analogous to this, the functional dependency in our model may represent how the network evolves adversely to the influence maximization process (or some other underlying network dynamic which is responsive to the influence maximization). The influence maximizer need not be aware of such dependencies to apply the algorithms that will be presented in the next sections.

III Stochastic Optimization Method: Perfectly Observed Graph Process with Unknown Dynamics

In this section, we propose a stochastic optimization method for the influence maximizer to recursively estimate the solution of the optimization problem in (6) under the Assumption 1 stated below.

Assumption 1.

The influence maximizer can fully observe the sample paths of the Markovian graph process, but does not know the transition probabilities with which it evolves.

The schematic overview of the approach for solving (6) is shown in Fig. 1 (where, the HMM filter is not needed in this section due to the Assumption 1). In the next two subsections, the conditional influence estimation algorithm and the stochastic optimization algorithm will be presented.

III-A Estimating the Conditional Influence Function using Cohen’s Algorithm

The exact computation of the node influence σG(i,T)\sigma_{G}(i,T) in closed form or estimating it with a naive sampling approach is computationally infeasible (as explained in Sec. II-B). As a solution, [13] shows that the shortest path property of the IC model (explained in Sec. II-A) can be used to convert (2) into an expression involving a set of independent random variables as follows:

σG(,T)=𝔼πtt{iV𝟙{gi({τji}(ji)E|)T}|,G}\sigma_{G}({\mathcal{I}},T)=\mathbb{E}_{{\pi_{tt}}}\Big{\{}\sum_{i\in V}\mathds{1}_{\{g_{i}(\{\tau_{ji}\}_{(ji)\in E}|{\mathcal{I}})\leq T\}}\Big{|}{\mathcal{I}},G\Big{\}} (8)

where, gi({τji}(ji)E|)g_{i}(\{\tau_{ji}\}_{(ji)\in E}|{\mathcal{I}}) is the shortest path as defined previously in (1). Further, note from (8) that influence of the set {\mathcal{I}}, σG(,T)\sigma_{G}({\mathcal{I}},T) is the expected TT-distance neighborhood (expected number of nodes within TT distance from the seed nodes) i.e.

σG(,T)=𝔼πtt{[|𝒩G(,T)|]|,G}\sigma_{G}({\mathcal{I}},T)=\mathbb{E}_{{\pi_{tt}}}\big{\{}\big{[}|\mathcal{N}_{G}({\mathcal{I}},T)|\big{]}\big{|}{\mathcal{I}},G\big{\}} (9)

where,

𝒩G(,T)={iV:gi({τji}(ji)E|)T}.\mathcal{N}_{G}({\mathcal{I}},T)=\{i\in V:g_{i}(\{\tau_{ji}\}_{(ji)\in E}|{\mathcal{I}})\leq T\}. (10)

Hence, we only need a neighborhood size estimation algorithm and samples from πtt{\pi_{tt}} to estimate the influence σG(,T)\sigma_{G}({\mathcal{I}},T) of the set V{\mathcal{I}}\subset V.

III-A1 Cohen’s Algorithm

Based on (9), [13] utilizes a neighborhood size estimation algorithm proposed in [15] in order to obtain an unbiased estimate of influence σG(,T)\sigma_{G}({\mathcal{I}},T). This algorithm is henceforth referred to as Cohen’s algorithm. The main idea behind the Cohen’s algorithm is the fact that the minimum of a finite set of unit mean exponential random variables is an exponential random variable with an exponent term equal to the total number random variables in the set. Hence, for a given graph G=(V,E){G=(V,E)} and a transmission delay set Lu={τji}(ji)EuL_{u}={\{\tau_{ji}\}^{u}_{(ji)\in E}} (where uu denotes the index of the set of transmission delays), this algorithm assigns mm number of exponential random variable sets lu,j={rvu,jexp(1):vV}{l_{u,j}=\{r_{v}^{u,j}\sim\exp(1):v\in V\}}, where, j=1,,mj=1,\dots,m. Then, a modified Dijkstra’s algorithm (refer [15, 13] for a detailed description of the steps of this algorithm) finds the smallest exponential random variable r¯u,j\bar{r}^{{\mathcal{I}}}_{u,j} within TT-distance from the set {\mathcal{I}}, for each lu,jl_{u,j} with respect to the transmission time set LuL_{u}. Then, [15] shows that 𝔼{m1j=1mr¯u,j|Lu}\mathbb{E}\big{\{}\frac{m-1}{\sum_{j=1}^{m}\bar{r}^{{\mathcal{I}}}_{u,j}}\big{|}L_{u}\big{\}} is an unbiased estimate of 𝒩G(,T)\mathcal{N}_{G}({\mathcal{I}},T) conditional on LuL_{u}. Further, this Cohen’s algorithm for estimating σG(,T)\sigma_{G}({\mathcal{I}},T) has a lower computational complexity which is near linear in the size of network size, compared to the computational complexity of a naive simulation approach (repeated calling of shortest path algorithm and averaging) to estimate σG(,T)\sigma_{G}({\mathcal{I}},T) [13].

III-A2 Reduced Variance Estimation of Influence using Cohen’s algorithm

We propose Algorithm 1 in order to estimate the conditional influence function c(θ,G)c(\theta,G). First four steps of Algorithm 1 are based on a reduced variance version of the Cohen’s algorithm employed in [13] called CONTINEST. The unbiasedness and the reduced (compared to the algorithm used in [13]) variance of the estimates obtained using Algorithm 1 are established in Theorem 1.

Input: Sampling distribution pθ{p_{\theta}}, Cumulaive Distribution Ftt()F_{tt}(\cdot) of the transmission time distribution πtt{\pi_{tt}}, Directed graph G=(V,E)G=(V,E)
Output: Estimate of conditional influence function: c^(θ,G)\hat{c}(\theta,G)
For all vVv\in V, execute the following steps simultaneously.
  1. 1.

    Generate s/2s/2 sets of uniform random variables:

    {{Ujiu}(ji)E:u=1,2,,s/2}\{\{U^{u}_{ji}\}_{(ji)\in E}:u=1,2,\dots,s/2\}
  2. 2.

    For each u=1,,s/2u=1,\cdots,s/2, generate a correlated pair of random transmission time sets as follows:

    Lu\displaystyle L_{u} ={Ftt1(Ujiu)}(ji)E\displaystyle=\{F_{tt}^{-1}(U^{u}_{ji})\}_{(ji)\in E} (11)
    Ls/2+u\displaystyle L_{s/2+u} ={Ftt1(1Ujiu)}(ji)E\displaystyle=\{F_{tt}^{-1}(1-U^{u}_{ji})\}_{(ji)\in E} (12)
  3. 3.

    For each set LuL_{u} where u=1,,su=1,\cdots,s, assign mm sets of independent exponential random variables: lu,j={rvu,jexp(1):vV},j=1,,m{l_{u,j}=\{r_{v}^{u,j}\sim\exp(1):v\in V\}},\,j=1,\dots,m.

  4. 4.

    Compute the minimum exponential random variable r¯u,jv\bar{r}^{v}_{u,j} that is within TT-distance from vv using the modified Dijkstra’s algorithm, for each lu,jl_{u,j}. Calculate,

    X(v,G)=1su=1sm1j=1mr¯u,jvX(v,G)=\frac{1}{s}\sum_{u=1}^{s}\frac{m-1}{\sum_{j=1}^{m}\bar{r}^{v}_{u,j}} (13)
  5. 5.

    Compute,

    c^(θ,G)=vVpθ(v)X(v,G).\hat{c}(\theta,G)=\sum_{v\in V}{p_{\theta}}(v)X(v,G). (14)
Algorithm 1 Conditional influence estimation algorithm
Theorem 1.

Consider a graph G=(V,E)G=(V,E).

  1. I.

    Given vVv\in V, X(v,G)X(v,G) in (13) is an unbiased estimate of node influence σG(v,T)\sigma_{G}(v,T), with a variance

    Var(X(v,G))1s(σG(v,T)2m2+(m1)Var(|𝒩G(v,T)|)m2)Var(X(v,G))\\ \hskip 7.11317pt\leq\frac{1}{s}\bigg{(}{\frac{\sigma_{G}(v,T)^{2}}{m-2}+\frac{(m-1)Var(|\mathcal{N}_{G}(v,T)|)}{m-2}}\bigg{)} (15)
  2. II.

    The output c^(θ,G)\hat{c}(\theta,G) of Algorithm 1 is an unbiased estimate of the conditional influence function c(θ,G)c(\theta,G) defined in (5). Variance of c^(θ,G)\hat{c}(\theta,G) is bounded above by the variance in the case where LuL_{u} and Ls/2+uL_{s/2+u} (in Step 2 of Algorithm 1) are independently generated.

Proof.

See Appendix A. ∎

Theorem 1 shows that the estimate X(v,G)X(v,G) computed in Algorithm 1 has a smaller variance compared to the estimate computed by the Cohen’s algorithm based influence estimation method (named CONTINEST) used in [13] which has a variance of 1s(σG(v,T)2m2+(m1)Var(|𝒩G(v,T)|)m2)\frac{1}{s}\big{(}{\frac{\sigma_{G}(v,T)^{2}}{m-2}+\frac{(m-1)Var(|\mathcal{N}_{G}(v,T)|)}{m-2}}\big{)}. The reason for this reduced variance is the correlation created by using the same set of uniform random numbers (indexed by uu) to generate a pair of transmission time sets (LuL_{u} and Ls/2+uL_{s/2+u}). Due to this use of same random number number for multiple realizations, this method is referred to as the method of common random numbers [25]. This reduced variance in the estimates X(v,G)X(v,G) results in a reduced variance in the estimate c^(θ,G)\hat{c}(\theta,G) of the conditional influence c(θ,G){c}(\theta,G).

III-B Stochastic Optimization Algorithm

Input: Initial parameterization θ0\theta_{0}, Transmission time distribution πtt{\pi_{tt}}, Observations of the graph process {Gn}n0\{G_{n}\}_{n\geq 0}
Output: Estimate of the (locally) optimal solution θ\theta^{*} of C(θ)C(\theta)
For k=0,1,k=0,1,\dots, execute the following steps.
  1. 1.

    Simulate the MM dimensional vector dkd_{k} with random elements

    dk(i)={+1 with probability 0.51 with probability 0.5.d_{k}(i)=\begin{cases}+1\text{ \,with probability }0.5\\ -1\text{ \,with probability }0.5.\end{cases}
  2. 2.

    Set θ=θk+Δdk\theta=\theta_{k}+\Delta d_{k} where, Δ>0\Delta>0.

  3. 3.

    Sample a node from the network using pθp_{\theta} and, distribute the 2k2kth message with the sampled node as seed.

  4. 4.

    Obtain c^(θ,Gn)\hat{c}(\theta,G_{n}) for n=2kN,2kN+1,,2kN+N¯1n=2kN,2kN+1,\dots,2kN+\bar{N}-1 using Algorithm 1 and, calculate

    C^k(θk+Δdk)=1N¯n=2kN2kN+N¯1c^(θ,Gn).\hat{C}_{k}(\theta_{k}+\Delta d_{k})=\frac{1}{\bar{N}}\sum_{n=2kN}^{2kN+\bar{N}-1}\hat{c}(\theta,G_{n}). (16)
  5. 5.

    Set θ=θkΔdk\theta=\theta_{k}-\Delta d_{k}. Sample a node from the network using pθp_{\theta} and, distribute the 2k+12k+1th message with the sampled node as seed.

  6. 6.

    Obtain c^(θ,Gn)\hat{c}(\theta,G_{n}) for n=(2k+1)N,(2k+1)N+1,,(2k+1)N+N¯1n=(2k+1)N,(2k+1)N+1,\dots,(2k+1)N+\bar{N}-1 using Algorithm 1 and, calculate

    C^k(θkΔdk)=1N¯n=(2k+1)N(2k+1)N+N¯1c^(θ,Gn).\hat{C}_{k}(\theta_{k}-\Delta d_{k})=\frac{1}{\bar{N}}\sum_{n=(2k+1)N}^{(2k+1)N+\bar{N}-1}\hat{c}(\theta,G_{n}).
  7. 7.

    Obtain the gradient estimate,

    ^Ck(θk)=C^k(θk+Δdk)C^k(θkΔdk)2Δdk.\hat{\nabla}C_{k}(\theta_{k})=\frac{\hat{C}_{k}(\theta_{k}+\Delta d_{k})-\hat{C}_{k}(\theta_{k}-\Delta d_{k})}{2\Delta}d_{k}. (17)
  8. 8.

    Update sampling distribution parameter θk\theta_{k} via stochastic gradient algorithm

    θk+1=θk+ϵ^Ck(θk)\theta_{k+1}=\theta_{k}+\epsilon\hat{\nabla}C_{k}(\theta_{k}) (18)

    where, ϵ>0.\epsilon>0.

Algorithm 2 SPSA based algorithm to estimate θ\theta^{*}

We propose Algorithm 2 for solving the optimization problem (6), utilizing the conditional influence estimates obtained via Algorithm 1. Algorithm 2 is based on the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm (see [26, 27] for details). In general, SPSA algorithm utilizes a finite difference estimate ^Ck(θk)\hat{\nabla}C_{k}(\theta_{k}), of the gradient of the function C(θ)C(\theta) at the point θk\theta_{k} in the (kkth iteration of the) recursion,

θk+1=θk+ϵk^Ck(θk).\theta_{k+1}=\theta_{k}+\epsilon_{k}\hat{\nabla}C_{k}(\theta_{k}). (19)

In the kkth iteration of the Algorithm 2, the influence maximizer passes a message to the network using a seed node sampled from the distribution pθ{p_{\theta}} where, θ=θk+Δdk\theta=\theta_{k}+\Delta d_{k} (step 3). Sampling with pθ{p_{\theta}} causes the transition matrix to be become Pθ{P_{\theta}} (recall Remark 1). Then, in step 4, (16) averages the conditional influence estimates c^(θ,Gn){\hat{c}(\theta,G_{n})} over N¯\bar{N} consecutive time instants (where, N¯\bar{N} is the length of the available sample path as defined in Sec. II-A) to obtain C^k(θk+Δdk){\hat{C}_{k}(\theta_{k}+\Delta d_{k})}, which is an asymptotically convergent estimate (by the law of large numbers for Markov Chains [28, 29]) of C(θk+Δdk){C}(\theta_{k}+\Delta d_{k}). Similarly, steps 5 and 6 obtain C^k(θkΔdk){\hat{C}_{k}(\theta_{k}-\Delta d_{k})}, which is an estimate of C(θkΔdk){C}(\theta_{k}-\Delta d_{k}). Using these influence function estimates, (17) computes the finite difference gradient estimate ^Ck(θk)\hat{\nabla}C_{k}(\theta_{k}) in step 7. Finally, step 8 updates the MM-dimensional parameter vector using the gradient estimate computed in (17). Some remarks about this algorithm are as follows.

Remark 3.

Algorithm 2 operates in two nested time scales which are as follows (from the fastest to the slowest):

  1. 1.

    tt\,- continuous time scale on which the information diffusion takes place in a given realization of the graph.

  2. 2.

    nn\,- discrete time scale on which the graph evolves.

Further, updating the parameter vector takes place periodically over the scale nn, with a period of 2N2N (where, NN is the time duration between two messages as defined in Sec. II-A).

Remark 4.

Note that all elements of the parameter vector are simultaneously perturbed in the SPSA based approach. Therefore, the parameter vector is updated once every two messages. This is in contrast to other finite difference methods such as Kiefer-Wolfowitz method, which requires 2M2M number of messages (where, MM is the dimension of the parameter vector as defined in Sec. II-A) for each update.

Next, we establish the convergence of Algorithm 2 using standard results which gives sufficient conditions for the convergence of recursive stochastic gradient algorithms (for details, see [27, 30, 31]).

Theorem 2.

The sequence {θk}k0\{\theta_{k}\}_{k\geq 0} in (18) converges weakly to a locally optimal parameter θ\theta^{*}.

Proof.

See Appendix B. ∎

IV Stochastic Optimization Method: Partially Observed Graph Process with Known Dynamics

In this section, we assume that the influence maximizer can observe only a small part of the social network graph at each time instant. The aim of this section is to combine the stochastic optimization framework proposed in Sec. III to this partially observed setting.

IV-A Partially Observed Graph Process

In some applications, the influence maximizer can observe only a small part of the full network GnG_{n}, at any time instant nn. Let GV¯G^{\bar{V}} denote the subgraph of G=(V,E)G=(V,E), induced by the set of nodes V¯V\bar{V}\subseteq V. Then, we consider the case where, the observable part is the subgraph of GnG_{n} which is induced by a fixed subset of nodes V¯V\bar{V}\subseteq V i.e. the observable part at time nn is GnV¯G_{n}^{\bar{V}} (GV¯G^{\bar{V}} denotes the subgraph of GG, induced by the set of nodes V¯\bar{V})666For example consider the friendship network Gn=(V,En)G_{n}=(V,E_{n}) of all the high school students in a city at time nn. The smaller observable part could be the friendship network formed by the set of students in a particular high school VVV^{\prime}\subset V, which is a subgraph of the friendship network GnG_{n}. The influence maximizer then needs to perform influence maximization by observing this subgraph.. Then, the observation space 𝒢¯\bar{\mathcal{G}} of the Markovian graph process can be defined as,

𝒢¯=G𝒢GV¯,\bar{\mathcal{G}}=\bigcup_{G\in\mathcal{G}}G^{\bar{V}}, (20)

which consists of the subgraphs induced by V¯\bar{V} in each graph G𝒢G\in\mathcal{G} (the case V¯=V{\bar{V}=V} corresponds to the perfectly observed case). For each G𝒢G\in\mathcal{G} and G¯𝒢¯\bar{G}\in\bar{\mathcal{G}}, the observation likelihoods, denoted by BGG¯B_{G\bar{G}} are defined as,

BGG¯\displaystyle B_{G\bar{G}} =(GnV¯=G¯|Gn=G).\displaystyle=\mathbb{P}(G_{n}^{\bar{V}}=\bar{G}\,\bigr{|}\,G_{n}=G). (21)

In our system model, these observation likelihoods can take only binary values:

BGG¯={1if GV¯=G¯0otherwise.B_{G\bar{G}}=\begin{cases}1&\text{if $G^{\bar{V}}=\bar{G}$}\\ 0&\text{otherwise}\end{cases}. (22)

i.e. BGG¯=1B_{G\bar{G}}=1 if the subgraph formed by the set of nodes V¯\bar{V} in graph G𝒢G\in\mathcal{G} is G¯𝒴\bar{G}\in\mathcal{Y} and, 0 otherwise. In this setting, our main assumption is the following.

Assumption 2.

The measurement likelihood matrix BB and the parameterized transition probability matrix Pθ{P_{\theta}}, are known to the influence maximizer but, the (finite) sample paths of the Markovian graph process are observed in noise.

Then, the aim of the influence maximizer is to recursively solve the problem of influence maximization given in (6), utilizing the information assumed to be known in Assumption 2. However, solving (6) is non-trivial even under the Assumption 2, as emphasized in the following remark.

Remark 5.

Even when Pθ{P_{\theta}} is known, it is intractable to compute πθ{\pi_{\theta}} in closed form for many cases [32]. Hence, assumption 2 does not imply that a closed form expression of the objective function in (4) can be obtained and hence, solving (6) still remains a non-trivial problem.

IV-B Randomized Influence Maximization using HMM Filter Estimates

Input: Pθ,B{P_{\theta}},B and intial prior π0\pi_{0}
Output: Finite sample estimate C^Nθ\hat{C}^{\theta}_{N} of the influence function C(θ)C(\theta)
  1. 1.

    For every time instant n=1,2,,N1n=1,2,\dots,N-1, given observation Gn+1V¯G^{\bar{V}}_{n+1}, update the |𝒢||\mathcal{G}|-dimensional posterior:

    πn+1θ=T(πn,Gn+1V¯)=BGn+1V¯Pθπnθσ(πnθ,Gn+1V¯)\pi_{n+1}^{\theta}=T(\pi_{n},G^{\bar{V}}_{n+1})=\frac{B_{G^{\bar{V}}_{n+1}}P^{\prime}_{\theta}\pi_{n}^{\theta}}{\sigma(\pi_{n}^{\theta},G^{\bar{V}}_{n+1})} (23)

    where,

    σ(πnθ,Gn+1V¯)=𝟏BGn+1V¯Pθπnθ\sigma(\pi_{n}^{\theta},G^{\bar{V}}_{n+1})=\mathbf{1}^{\prime}B_{G^{\bar{V}}_{n+1}}P^{\prime}_{\theta}\pi_{n}^{\theta} (24)

    and, 𝟏\mathbf{1} denotes the column vector with elements equal to one.

  2. 2.

    Compute the estimate of the influence function C(θ)C(\theta),

    C^Nθ=n=0N1c^θπnθN.\hat{C}^{\theta}_{N}=\frac{\sum_{n=0}^{N-1}\hat{c}_{\theta}^{\prime}\pi_{n}^{\theta}}{N}. (25)

    where, c^θ\hat{c}_{\theta} denotes the column vector with elements c^(θ,Gi),Gi𝒢\hat{c}(\theta,G_{i}),\,G_{i}\in\mathcal{G}.

Algorithm 3 Hidden Markov Model (HMM) Filter Algorithm for Tracking the Graph Process

Assumption 2 made in IV-A makes it possible to implement an HMM filter (see [32] for a detailed treatment of HMM filters and related results). The HMM filter is a finite dimensional Bayesian filter which recursively (with each observation) computes πnθ\pi_{n}^{\theta} which is the probability distribution of the state of the graph, conditional on the sequence of observations {G0V¯,,GnV¯}\{G^{\bar{V}}_{0},\,\dots,G^{\bar{V}}_{n}\}. Algorithm 3 gives the HMM filter algorithm and, Theorem 3 establishes the asymptotic convergence of the influence function estimate obtained from from it.

Theorem 3.

The finite sample estimate C^Nθ\hat{C}^{\theta}_{N} of the influence function obtained in (25) is an asymptotically unbiased estimate of the influence function C(θ)C(\theta) i.e.

limN𝔼{C^Nθ}=C(θ).\lim_{N\to\infty}\mathbb{E}\{\hat{C}^{\theta}_{N}\}=C(\theta). (26)

Further, {θk}k0\{\theta_{k}\}_{k\geq 0} in (18) converges weakly to a locally optimal parameter θ\theta^{*}, when estimates computed in steps 4 and 6 of Algorithm 2 are replaced by the estimates obtained using the using Algorithm 3.

Proof.

See Appendix C

Finally, the Algorithm 2 can be modified by replacing the estimates computed in steps 4 and 6 with the influence function estimates obtained using the HMM filter in Algorithm 3.

V Numerical Results

Refer to caption
(a) Graph G1=(V,E1)G^{1}=(V,E^{1}) with two equal sized clusters
Refer to caption
(b) Graph G2=(V,E2)G^{2}=(V,E^{2}) with a single large cluster
Figure 2: State space 𝒢\mathcal{G}, of the Markovian graph process {Gn}n0\{G_{n}\}_{n\geq 0}

In this section, we apply the stochastic optimization algorithm presented in Sec. III to an example setting and, illustrate its convergence with a feasible number of iterations.

V-A Experimental Setup

We use the Stochastic Block Model (SBM) as a generative models to create the graphs used in this section. These models have been widely studied in statistics [33, 34, 35, 36, 37] and network science [38, 39] as generative models that closely resemble the real world networks.

State space of the graph process: We consider the graph process obtained by Markovian switching between the two graphs in Fig. 2: a graph where two dense equal sized clusters exist (Graph G1G^{1}) and, a graph where most of the nodes (45 out of 50) are in a single dense cluster (Graph G2G^{2}). These graphs are sampled from SBM models with the following parameter values: G1G^{1} with cluster sizes 25, 25 with, within cluster edge probability pwSBM=0.3p_{w}^{SBM}=0.3, between cluster edge probability pbSBM=0.01p_{b}^{SBM}=0.01 and, G2G^{2} with cluster sizes 45, 5 with, within cluster edge probability pwSBM=0.3p_{w}^{SBM}=0.3, between cluster edge probability pbSBM=0.01p_{b}^{SBM}=0.01. This graph process is motivated by the clustered and non-clustered states of a social network.

Sampling distribution and the graph evolution: We consider the case where the influence maximizer samples from a subset of nodes that consists of the two nodes indexed by v1v_{1} and v2v_{2} using the parameterized probability distribution pθ=[cos2(θ)sin2(θ)]{{p_{\theta}}=[\cos^{2}(\theta)\quad\sin^{2}(\theta)]}^{\prime}. These two nodes are located in different clusters in the graphs G1,G2G^{1},G^{2}. Also, the transition probabilities may depend on this sampling distribution (representing for example, the adversarial networks/nodes as explained in Sec. II-D). However, exact functional characterizations of such dependencies are not known in many practical applications. Also, the form of these dependencies may change over time as well (recall Remark 1). This experimental setup considers a graph process with a stationary distribution πθ=pθ{\pi_{\theta}}={p_{\theta}} in order to have a closed form influence function as a ground truth (in order to compare the accuracy of the estimates). In an actual implementation of the algorithm, this functional dependency need not be known to the influence maximizer.

Influence Functions: The transmission time distribution was selected to be an exponential distribution with mean 11 for edges within a cluster and, an exponential distribution with mean 1010 for between cluster edges (in the SBM). Further, the message expiration time was selected to be T=1.5T=1.5. Then, the influences of nodes v1,v2v_{1},v_{2} on graphs G1G_{1} and G2G_{2} were estimated to be as follows by evaluating the integral in Definition 1 with a naive simulation method (repeated use of the shortest path algorithm): σG1(v1,T)=25.2,σG1(v2,T)=23.2,σG2(v1,T)=45.1,σG2(v2,T)=5.8\sigma_{G^{1}}(v_{1},T)=25.2,\,\sigma_{G^{1}}(v_{2},T)=23.2,\,\sigma_{G^{2}}(v_{1},T)=45.1,\,\sigma_{G^{2}}(v_{2},T)=5.8. These values, along with the expressions for pθp_{\theta} and πθ{\pi_{\theta}} were used to obtain the following expression for the influence function (defined in Definition 2) to be compared with the outputs of the algorithm estimates:

C(θ)\displaystyle C(\theta) =πθ(G1)(σG1(v1,T)pθ(v1)+σG1(v2,T)pθ(v2))\displaystyle={\pi_{\theta}}(G^{1})(\sigma_{G^{1}}(v_{1},T)p_{\theta}(v_{1})+\sigma_{G^{1}}(v_{2},T)p_{\theta}(v_{2}))
+πθ(G2)(σG2(v1,T)pθ(v1)+σG2(v2,T)pθ(v2))\displaystyle\hskip 21.33955pt+{\pi_{\theta}}(G^{2})(\sigma_{G^{2}}(v_{1},T)p_{\theta}(v_{1})+\sigma_{G^{2}}(v_{2},T)p_{\theta}(v_{2}))
=17.9sin2(θ)37.3sin4(θ)+25.2.\displaystyle=17.9\sin^{2}(\theta)-37.3\sin^{4}(\theta)+25.2. (27)

In this context, the goal of our algorithm is to locate the value of θ\theta which maximizes this function, without using knowledge of C(θ)C(\theta) or πθ{\pi_{\theta}}.

V-B Convergence of the Recursive Algorithm for Influence Maximization

Refer to caption
Figure 3: Parameter value θk\theta_{k} and the absolute value of the error (difference of current and maximum influence) versus algorithm iteration, showing the convergence of the algorithm in a feasible number of iterations.

Algorithm 2 was utilized in an experimental setting with the parameters specified in the Sec. V-A. For this, the length of the observable sample path of the graph process was assumed to be N¯=30\bar{N}=30. Further, in the Algorithm 1, the number of transmission time sets (ss) and the number of exponential random variable sets (mm) were both set to be 1010.

With these numerical values for the parameters, Fig. 3 shows the variation of the absolute error (absolute value of the difference between the current and maximum influence) and the parameter value, against the iteration of the algorithm. From this, it can be seen that the algorithm finds optimal parameter in less than 50 iterations. Further, any change in the system model will result in a suboptimal (expected) influence only for 50 iterations since the algorithm is capable of tracking the time evolving optima. Hence, this shows that the stochastic optimization algorithm is capable of estimating the optimal parameter value with a smaller (less than 50) number of iterations.

V-C Effect of variance reduction in convergence and tracking the optima of a time-varying influence function

Refer to caption
Figure 4: Absolute value of the error versus algorithm iteration with and without the common random number based variance reduction. This depicts the importance of reduced variance in tracking the optima of a time evolving influence function.

Here we aim to see how the proposed stochastic approximation algorithm can track the optima when the system model changes on a slower time scale and, the effect of reduced variance Algorithm 1 on the accuracy of tracking. For this, the experimental setup described in V-A was utilized again and, a sudden change in the influence function (by changing the the state space of the graph process and the functional dependency) was introduced at the iteration number 200. In this setting, Fig. 4 depicts the variation of the absolute error in influence with the algorithm iteration for two cases: with the reduced variance Algorithm 1 (red curve) and without the variance reduction approach (black curve). It can be seen from Fig. 4 that the variance reduction method improves the speed of convergence to the optima initially (iterations 1 to 50) and, also in tracking the optima after a sudden change in the influence function (iterations 200 to 300). Further, after convergence (between iterations 50 to 200 and 300 to 500), it can be seen that reduced variance approach is less noisy compared to the method without variance reduction. Hence, this shows that the proposed approach is capable of tracking the optimal sampling distribution in a slowly evolving system such as, varying graph state space, evolving functional dependencies, etc.

VI Conclusion

This paper considered the problem of randomized influence maximization over a Markovian Graph Process: given a fixed set of nodes whose connectivity graph is evolving as a Markov chain, estimate the probability distribution (over this fixed set of nodes) that samples a node which will initiate the largest information cascade (in expectation). The evolution of the graph was allowed to functionally depend on the sampling probability distribution in order to keep the problem more general. This was formulated as a problem of tracking the optimal solution of a (time-varying) optimization problem where, a closed form expression of the objective function (influence function) is not available. In this setting, two stochastic gradient algorithms were presented to estimate the optimal sampling distribution for two cases: 1) transition probabilities of the graph are unknown but, the graph can be observed perfectly 2) transition probabilities of the graph are known but, the graph is observed in noise. These algorithms are based on the Simultaneous Perturbation Stochastic Approximation Algorithm that requires only the noisy estimates of the influence function. These noisy estimates of the influence function were obtained by combining a neighborhood size estimation algorithm with a variance reduction method and then, averaging over a finite sample path of the graph process. The convergence of the proposed methods were established theoretically and, illustrated with numerical examples. The numerical results show that, with the reduced variance approach, the algorithms are capable of tracking the optimal influence in a time varying system model (with changing graph state spaces, etc.).

Appendix A Proof of Theorem 1

Let the size of the TT-distance neighborhood 𝒩G(v,T)\mathcal{N}_{G}(v,T) of a node vVv\in V of graph G=(V,E)G=(V,E) conditional on a transmission time set Lu={τji}(ji)EuL_{u}={\{\tau_{ji}\}^{u}_{(ji)\in E}} be denoted by hv(Lu)h^{v}(L_{u}). Further, Let h^v(Lu)\hat{h}^{v}(L_{u}) denote m1j=1mr¯u,jv\frac{m-1}{\sum_{j=1}^{m}\bar{r}^{v}_{u,j}}.

𝔼{X(v,G)}\displaystyle\mathbb{E}\{X(v,G)\} =𝔼{1su=1sm1j=1mr¯u,jv}\displaystyle=\mathbb{E}\bigg{\{}\frac{1}{s}\sum_{u=1}^{s}\frac{m-1}{\sum_{j=1}^{m}\bar{r}^{v}_{u,j}}\bigg{\}} (28)
=1su=1s𝔼{h^v(Lu)}\displaystyle=\frac{1}{s}\sum_{u=1}^{s}\mathbb{E}\{{\hat{h}^{v}(L_{u})}\} (29)
=1su=1s𝔼{𝔼{h^v(Lu)|Lu}}\displaystyle=\frac{1}{s}\sum_{u=1}^{s}\mathbb{E}\{\mathbb{E}\{\hat{h}^{v}(L_{u})|L_{u}\}\} (30)
=1su=1s𝔼{hv(Lu)}\displaystyle=\frac{1}{s}\sum_{u=1}^{s}\mathbb{E}\big{\{}h^{v}(L_{u})\big{\}} (31)
(from conditional unbiasedness proved in [15])
=σG(v,T) (by (9))\displaystyle=\sigma_{G}(v,T)\text{ (by (\ref{eq:influence_of_a_node_on_a_graph}))} (32)

To analyze the variance of X(v,G)X(v,G), first note that hv({τji}(ji)Eu)h^{v}(\{\tau_{ji}\}^{u}_{(ji)\in E}) is a monotonically decreasing function of all its elements {τji}(ji)Eu\{\tau_{ji}\}^{u}_{(ji)\in E}. Further, the following result about monotone functions of random variables from [25] will be used to establish the result.

Lemma 4.

If g(x1,,xn)g(x_{1},...,x_{n}) is a monotone function of each of its arguments, then, for a set U1,,UnU_{1},...,U_{n} of independent random numbers.

Cov(g(U1,,Un),g(1U1,,1Un))0.\operatorname*{Cov}(g(U_{1},...,U_{n}),g(1-U_{1},...,1-U_{n}))\leq 0. (33)

Now, consider the variance of h^v(Lu)+h(Ls/2+u)2\frac{\hat{h}^{v}(L_{u})+h(L_{s/2+u})}{2} where LuL_{u} and Ls/2+uL_{s/2+u} are the pair of correlated transmission time sets as defined in (11) and (12).

Var(h^v(Lu)+h^(Ls/2+u)2)=14(Var(h^v(Lu))+Var(h^v(Ls/2+u))+2Cov(h^v(Lu),h^v(Ls/2+u)))\operatorname*{Var}\bigg{(}\frac{\hat{h}^{v}(L_{u})+\hat{h}(L_{s/2+u})}{2}\bigg{)}=\\ \frac{1}{4}\bigg{(}\operatorname*{Var}\big{(}\hat{h}^{v}(L_{u})\big{)}+\operatorname*{Var}\big{(}\hat{h}^{v}(L_{s/2+u})\big{)}\\ +2\operatorname*{Cov}\big{(}\hat{h}^{v}(L_{u}),\hat{h}^{v}(L_{s/2+u})\big{)}\bigg{)} (34)
=12(Var(h^v(Lu))+Cov(h^v(Lu),h^v(Ls/2+u)))\displaystyle=\frac{1}{2}\bigg{(}\operatorname*{Var}\big{(}\hat{h}^{v}(L_{u})\big{)}+\operatorname*{Cov}\big{(}\hat{h}^{v}(L_{u}),\hat{h}^{v}(L_{s/2+u})\big{)}\bigg{)} (35)
(since h^v(Lu)\hat{h}^{v}(L_{u}) and h^v(Ls/2+u)\hat{h}^{v}(L_{s/2+u}) are identically distributed)

Now consider Cov(h^v(Lu),h^v(Ls/2+u))\operatorname*{Cov}\big{(}\hat{h}^{v}(L_{u}),\hat{h}^{v}(L_{s/2+u})\big{)}. By using the law of total covariance,

Cov(h^v(Lu),h^v(Ls/2+u))=\displaystyle\operatorname*{Cov}\big{(}\hat{h}^{v}(L_{u}),\hat{h}^{v}(L_{s/2+u})\big{)}=
𝔼{Cov(h^v(Lu),h^v(Ls/2+u)|{Ujiu}(ji)E)}+\displaystyle\hskip 7.11317pt\mathbb{E}\big{\{}\operatorname*{Cov}\big{(}\hat{h}^{v}(L_{u}),\hat{h}^{v}(L_{s/2+u})\big{|}\{U^{u}_{ji}\}_{(ji)\in E}\big{)}\big{\}}+
Cov(𝔼{h^v(Lu)|{Ujiu}(ji)E},𝔼{h^v(Ls/2+u)|{Ujiu}(ji)E})\displaystyle\hskip 7.11317pt\operatorname*{Cov}\big{(}\mathbb{E}\{\hat{h}^{v}(L_{u})|\{U^{u}_{ji}\}_{(ji)\in E}\},\mathbb{E}\{\hat{h}^{v}(L_{s/2+u})|\{U^{u}_{ji}\}_{(ji)\in E}\}\big{)} (36)
=Cov(𝔼{h^v(Lu)|{Ujiu}(ji)E},𝔼{h^v(Ls/2+u)|{Ujiu}(ji)E})\displaystyle\hskip 7.11317pt=\operatorname*{Cov}\big{(}\mathbb{E}\{\hat{h}^{v}(L_{u})|\{U^{u}_{ji}\}_{(ji)\in E}\},\mathbb{E}\{\hat{h}^{v}(L_{s/2+u})|\{U^{u}_{ji}\}_{(ji)\in E}\}\ \big{)} (37)
(since h^v(Lu)\hat{h}^{v}(L_{u}) and h^v(Ls/2+u)\hat{h}^{v}(L_{s/2+u}) are uncorrelated given {Ujiu}(ji)E\{U^{u}_{ji}\}_{(ji)\in E}
=Cov(hv(Lu),hv(Ls/2+u))\displaystyle\hskip 7.11317pt=\operatorname*{Cov}\big{(}{h}^{v}(L_{u}),{h}^{v}(L_{s/2+u})\big{)} (38)
(from the conditional unbiasedness proved in [15])
=Cov(hv({Ftt1(Ujiu)}(ji)E),hv({Ftt1(1Ujiu)}(ji)E))\displaystyle\hskip 7.11317pt=\operatorname*{Cov}\Big{(}{h}^{v}(\{F_{tt}^{-1}(U^{u}_{ji})\}_{(ji)\in E}),{h}^{v}(\{F_{tt}^{-1}(1-U^{u}_{ji})\}_{(ji)\in E})\Big{)} (39)

Ftt1()F_{tt}^{-1}(\cdot) is a monotone function (inverse of a CDF) and, hv({τji}(ji)Eu)h^{v}(\{\tau_{ji}\}^{u}_{(ji)\in E}) is also monotone in all its arguments {τji}(ji)Eu\{\tau_{ji}\}^{u}_{(ji)\in E}. Hence, the composite function hv({Ftt1(Ujiu)}(ji)E){h}^{v}(\{F_{tt}^{-1}(U^{u}_{ji})\}_{(ji)\in E}) is monotone is all its arguments {Ujiu}(ji)E\{U^{u}_{ji}\}_{(ji)\in E} (because, the composition of monotone functions is monotone). Then, from Lemma 4, it follows that

Cov(h^v(Lu),h^v(Ls/2+u))0.\operatorname*{Cov}\big{(}\hat{h}^{v}(L_{u}),\hat{h}^{v}(L_{s/2+u})\big{)}\leq 0. (40)

Then, from (35), it follows that,

Var(h^v(Lu)+h^(Ls/2+u)2)12Var(h^v(Lu).\operatorname*{Var}\bigg{(}\frac{\hat{h}^{v}(L_{u})+\hat{h}(L_{s/2+u})}{2}\bigg{)}\leq\frac{1}{2}\operatorname*{Var}\big{(}\hat{h}^{v}(L_{u}). (41)

Then, by applying the total variance formula to the left hand side of (41) and, using the fact Var{h^v(Lu)|Lu}=h2(Lu)m2\operatorname*{Var}\{\hat{h}^{v}(L_{u})|L_{u}\}=\frac{h^{2}(L_{u})}{m-2}(from [15]), we get,

Var(h^v(Lu)+h^(Ls/2+u)2)12(σG(v,T)2m2+(m1)Var(|𝒩G(v,T)|)m2)\operatorname*{Var}\bigg{(}\frac{\hat{h}^{v}(L_{u})+\hat{h}(L_{s/2+u})}{2}\bigg{)}\leq\\ \frac{1}{2}\bigg{(}{\frac{\sigma_{G}(v,T)^{2}}{m-2}+\frac{(m-1)\operatorname*{Var}(|\mathcal{N}_{G}(v,T)|)}{m-2}}\bigg{)} (42)

and, the proof follows by noting that X(v,G)X(v,G) is the average of h^v(Lu)+h(Ls/2+u)2\frac{\hat{h}^{v}(L_{u})+h(L_{s/2+u})}{2} for u=1,,s/2u=1,\cdots,s/2.

Proof of Part II of Theorem 1 follows from similar arguments as above and hence omitted.

Appendix B Proof of Theorem 2

The following result from [30] will be used to establish the weak convergence of the sequence {θk}k0\{\theta_{k}\}_{k\geq 0} obtained in Algorithm 2.

Consider the stochastic approximation algorithm,

θk+1=θk+ϵH(θk,xk),k=0,1,\theta_{k+1}=\theta_{k}+\epsilon H(\theta_{k},x_{k}),\quad k=0,1,\dots (43)

where ϵ>0\epsilon>0, {xk}\{x_{k}\} is a random process and, θp\theta\in\mathbb{R}^{p} is the estimate generated at time k=0,1,k=0,1,\dots. Further, let

θϵ(t)=θk for t[kϵ,kϵ+ϵ],k=0,1,,\theta^{\epsilon}(t)=\theta_{k}\text{ for }t\in[k\epsilon,k\epsilon+\epsilon],\quad k=0,1,\dots, (44)

which is a piecewise constant interpolation of {θk}\{\theta_{k}\}. In this setting, the following result holds.

Theorem 5.

Consider the stochastic approximation algorithm (43). Assume

  • SA1:

    H(θ,x)H(\theta,x) us uniformly bounded for all θp\theta\in\mathbb{R}^{p} and xqx\in\mathbb{R}^{q}.

  • SA2:

    For any l0l\geq 0, there exists h(θ)h(\theta) such that

    1Nk=lN+l1𝔼l{H(θ,xk)}h(θ) as N.\frac{1}{N}\sum_{k=l}^{N+l-1}\mathbb{E}_{l}\{H(\theta,x_{k})\}\rightarrow h(\theta)\text{ as }N\rightarrow\infty. (45)

    where, 𝔼l{}\mathbb{E}_{l}\{\cdot\} denotes expectation with respect to the sigma algebra generated by {xk:k<l}\{x_{k}:k<l\}.

  • SA3:

    The ordinary differential equation (ODE)

    dθ(t)dt=h(θ(t)),θ(0)=θ0\frac{d\theta(t)}{dt}=h(\theta(t)),\quad\theta(0)=\theta_{0} (46)

    has a unique solution for every initial condition.

Then, the interpolated estimates θϵ(t)\theta^{\epsilon}(t) defined in (44) satisfies

limϵ0(sup0tT|θϵ(t)θ(t)|η)=0 for all T>0,η>0\lim_{\epsilon\to 0}\mathbb{P}\big{(}\sup_{0\leq t\leq T}|\theta^{\epsilon}(t)-\theta(t)|\geq\eta\big{)}=0\text{ for all }T>0,\eta>0 (47)

where, θ(t)\theta(t) is the solution of the ODE (46).

The condition SA1 in Theorem 5 can be replaced by uniform integrability and the result still holds [32].

Next, we show how Algorithm 2 fulfills the assumptions SA1, SA2, SA3 in Theorem 5. Detailed steps of similar proofs related to stochastic approximation algorithms can be found in [40] and Chapter 17 of [32].

Consider supk𝔼^Ck(θk)\sup_{k}\mathbb{E}||\hat{\nabla}{C_{k}(\theta_{k})}|| defined in Algorithm 2.

supk𝔼^Ck(θk)\displaystyle\sup_{k}\mathbb{E}||\hat{\nabla}{C_{k}(\theta_{k})}|| =supk𝔼||(1N¯n=2kN2kN+N¯1c^(θ,Gn)\displaystyle=\sup_{k}\mathbb{E}\bigg{|}\bigg{|}\bigg{(}\frac{1}{\bar{N}}\sum_{n=2kN}^{2kN+\bar{N}-1}\hat{c}(\theta,G_{n}) (48)
1N¯n=(2k+1)N(2k+1)N+N¯1c^(θ,Gn))dk2Δ||\displaystyle\hskip 28.45274pt-\frac{1}{\bar{N}}\sum_{n=(2k+1)N}^{(2k+1)N+\bar{N}-1}\hat{c}(\theta,G_{n})\bigg{)}\frac{d_{k}}{2\Delta}\bigg{|}\bigg{|}
=supkM2ΔN¯𝔼|n=2kN2kN+N¯1c^(θ,Gn)\displaystyle=\sup_{k}\frac{\sqrt{M}}{2\Delta\bar{N}}\mathbb{E}\bigg{|}\sum_{n=2kN}^{2kN+\bar{N}-1}\hat{c}(\theta,G_{n}) (49)
n=(2k+1)N(2k+1)N+N¯1c^(θ,Gn)|\displaystyle\hskip 28.45274pt-\sum_{n=(2k+1)N}^{(2k+1)N+\bar{N}-1}\hat{c}(\theta,G_{n})\bigg{|} (50)
supkM2ΔN¯𝔼{n=2kN(2k+1)N+N¯1|c^(θ,Gn)|}\displaystyle\leq\sup_{k}\frac{\sqrt{M}}{2\Delta\bar{N}}\mathbb{E}\bigg{\{}\sum_{n=2kN}^{{(2k+1)N+\bar{N}-1}}\big{|}\hat{c}(\theta,G_{n})\big{|}\bigg{\}} (51)
(By triangle inequality)
=supkM2ΔN¯n=2kN(2k+1)N+N¯1𝔼{c^(θ,Gn)}\displaystyle=\sup_{k}\frac{\sqrt{M}}{2\Delta\bar{N}}\sum_{n=2kN}^{{(2k+1)N+\bar{N}-1}}\mathbb{E}\{\hat{c}(\theta,G_{n})\} (52)
(Since c^(θ,Gn)0\hat{c}(\theta,G_{n})\geq 0)
=supkM2ΔN¯n=2kN(2k+1)N+N¯1𝔼{c(θ,Gn)}\displaystyle=\sup_{k}\frac{\sqrt{M}}{2\Delta\bar{N}}\sum_{n=2kN}^{{(2k+1)N+\bar{N}-1}}\mathbb{E}\{{c}(\theta,G_{n})\} (53)
(Conditioning on GnG_{n} and,
using Part II of Theorem 1)
supkM2ΔN¯n=2kN(2k+1)N+N¯1{maxvV,G𝒢σG(v,T)}\displaystyle\leq\sup_{k}\frac{\sqrt{M}}{2\Delta\bar{N}}\sum_{n=2kN}^{{(2k+1)N+\bar{N}-1}}\big{\{}\max_{v\in V,G\in\mathcal{G}}\sigma_{G}(v,T)\big{\}} (54)
=MΔmaxvV,G𝒢σG(v,T)\displaystyle=\frac{\sqrt{M}}{\Delta}\max_{v\in V,G\in\mathcal{G}}\sigma_{G}(v,T) (55)
(maximum exists since V,𝒢V,\mathcal{G} are finite sets)

Hence the uniform integrability condition (alternative for SA1) is fulfilled.

Next, note that Ck(θk){C_{k}(\theta_{k})} is an asymptotically (as NN tends to infinity) unbiased estimate of C(θk)C(\theta_{k}) by the uniform integrability and almost sure convergence (by law of large numbers for ergodic Markov chains). Therefore, as Δ\Delta (perturbation size in (17)) tends to zero, ^Ck(θk)\hat{\nabla}{C_{k}(\theta_{k})} in Algorithm 2 fulfills the SA2 condition.

SA3 is fulfilled by the (global) Lipschitz continuity of the gradient θC(θ)\nabla_{\theta}C(\theta) which is a sufficient condition for the existence of a unique solution for a non-linear ODE (for any initial condition) [41].

Appendix C Proof of Theorem 3

Consider the expected value of the finite sample estimate of the influence function (output of Algorithm 3)

𝔼{C^Nθ}\displaystyle\mathbb{E}\{\hat{C}^{\theta}_{N}\} =𝔼{n=1Nc^TπnθN}\displaystyle=\mathbb{E}\{\frac{\sum_{n=1}^{N}\hat{c}^{T}\pi_{n}^{\theta}}{N}\} (56)
=cTNn=1N𝔼{πnθ}\displaystyle=\frac{c^{T}}{N}\sum_{n=1}^{N}\mathbb{E}\{\pi_{n}^{\theta}\} (57)
(by Theorem 1)\displaystyle\text{ (by Theorem \ref{th:unbiasedness}})
=cTNn=1N(Pθn)Tπ0\displaystyle=\frac{c^{T}}{N}\sum_{n=1}^{N}({P_{\theta}}^{n})^{T}\pi_{0} (58)

Then, the result follows by noting that (58) is a Cesàro summation of a convergent sequence.

The weak convergence of the sequence {θk}k0\{\theta_{k}\}_{k\geq 0} (with HMM filter estimates) follows again from Theorem 5. Since the proof is mostly similar to the Proof of Theorem 2, we skip the proof.

References

  • [1] D. Kempe, J. Kleinberg, and É. Tardos, “Maximizing the spread of influence through a social network,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137–146, ACM, 2003.
  • [2] M. Hamdi, V. Krishnamurthy, and G. Yin, “Tracking a Markov-modulated stationary degree distribution of a dynamic random graph,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 6609–6625, 2014.
  • [3] S. Bharathi, D. Kempe, and M. Salek, “Competitive influence maximization in social networks,” in International Workshop on Web and Internet Economics, pp. 306–311, Springer, 2007.
  • [4] A. Borodin, Y. Filmus, and J. Oren, “Threshold models for competitive influence in social networks.,” in WINE, vol. 6484, pp. 539–550, Springer, 2010.
  • [5] T. Carnes, C. Nagarajan, S. M. Wild, and A. Van Zuylen, “Maximizing influence in a competitive social network: a follower’s perspective,” in Proceedings of the ninth international conference on Electronic commerce, pp. 351–360, ACM, 2007.
  • [6] W. Chen, A. Collins, R. Cummings, T. Ke, Z. Liu, D. Rincon, X. Sun, Y. Wang, W. Wei, and Y. Yuan, “Influence maximization in social networks when negative opinions may emerge and propagate,” in Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 379–390, SIAM, 2011.
  • [7] L. Seeman and Y. Singer, “Adaptive seeding in social networks,” in Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pp. 459–468, IEEE, 2013.
  • [8] T. Horel and Y. Singer, “Scalable methods for adaptively seeding a social network,” in Proceedings of the 24th International Conference on World Wide Web, pp. 441–451, International World Wide Web Conferences Steering Committee, 2015.
  • [9] S. L. Feld, “Why your friends have more friends than you do,” American Journal of Sociology, vol. 96, no. 6, pp. 1464–1477, 1991.
  • [10] S. Lattanzi and Y. Singer, “The power of random neighbors in social networks,” in Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 77–86, ACM, 2015.
  • [11] W. Chen, Y. Wang, and S. Yang, “Efficient influence maximization in social networks,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 199–208, ACM, 2009.
  • [12] W. Chen, C. Wang, and Y. Wang, “Scalable influence maximization for prevalent viral marketing in large-scale social networks,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1029–1038, ACM, 2010.
  • [13] N. Du, Y. Liang, M.-F. Balcan, M. Gomez-Rodriguez, H. Zha, and L. Song, “Scalable influence maximization for multiple products in continuous-time diffusion networks,” Journal of Machine Learning Research, vol. 18, no. 2, pp. 1–45, 2017.
  • [14] H. Zhuang, Y. Sun, J. Tang, J. Zhang, and X. Sun, “Influence maximization in dynamic social networks,” in Data Mining (ICDM), 2013 IEEE 13th International Conference on, pp. 1313–1318, IEEE, 2013.
  • [15] E. Cohen, “Size-estimation framework with applications to transitive closure and reachability,” Journal of Computer and System Sciences, vol. 55, no. 3, pp. 441–453, 1997.
  • [16] W. Chen, Y. Yuan, and L. Zhang, “Scalable influence maximization in social networks under the linear threshold model,” in Data Mining (ICDM), 2010 IEEE 10th International Conference on, pp. 88–97, IEEE, 2010.
  • [17] M. G. Rodriguez, D. Balduzzi, and B. Schölkopf, “Uncovering the temporal dynamics of diffusion networks,” arXiv preprint arXiv:1105.0697, 2011.
  • [18] V. Krishnamurthy, O. N. Gharehshiran, M. Hamdi, et al., “Interactive sensing and decision making in social networks,” Foundations and Trends® in Signal Processing, vol. 7, no. 1-2, pp. 1–196, 2014.
  • [19] A. H. Sayed et al., “Adaptation, learning, and optimization over networks,” Foundations and Trends® in Machine Learning, vol. 7, no. 4-5, pp. 311–801, 2014.
  • [20] O. N. Gharehshiran, V. Krishnamurthy, and G. Yin, “Distributed tracking of correlated equilibria in regime switching noncooperative games,” IEEE Transactions on Automatic Control, vol. 58, no. 10, pp. 2435–2450, 2013.
  • [21] M. G. Rodriguez and B. Schölkopf, “Influence maximization in continuous time diffusion networks,” arXiv preprint arXiv:1205.1682, 2012.
  • [22] A. E. Clementi, F. Pasquale, A. Monti, and R. Silvestri, “Information spreading in stationary Markovian evolving graphs,” in Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pp. 1–12, IEEE, 2009.
  • [23] A. Clementi, P. Crescenzi, C. Doerr, P. Fraigniaud, F. Pasquale, and R. Silvestri, “Rumor spreading in random evolving graphs,” Random Structures & Algorithms, vol. 48, no. 2, pp. 290–312, 2016.
  • [24] A. E. Clementi, A. Monti, F. Pasquale, and R. Silvestri, “Broadcasting in dynamic radio networks,” Journal of Computer and System Sciences, vol. 75, no. 4, pp. 213–230, 2009.
  • [25] S. M. Ross. Elsevier, 2013.
  • [26] J. C. Spall, “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation,” IEEE transactions on automatic control, vol. 37, no. 3, pp. 332–341, 1992.
  • [27] J. C. Spall, “Simultaneous perturbation stochastic approximation,” Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, pp. 176–207, 2003.
  • [28] R. Durrett, Probability: theory and examples. Cambridge university press, 2010.
  • [29] J. R. Norris, Markov chains. No. 2, Cambridge university press, 1998.
  • [30] H. J. Kushner and G. Yin, Stochastic approximation and recursive algorithms and applications. No. 35 in Applications of mathematics, New York: Springer, 2003.
  • [31] V. Krishnamurthy, M. Maskery, and G. Yin, “Decentralized adaptive filtering algorithms for sensor activation in an unattended ground sensor network,” IEEE Transactions on Signal Processing, vol. 56, no. 12, pp. 6086–6101, 2008.
  • [32] V. Krishnamurthy, Partially Observed Markov Decision Processes. Cambridge University Press, 2016.
  • [33] Y. Zhao, E. Levina, J. Zhu, et al., “Consistency of community detection in networks under degree-corrected stochastic block models,” The Annals of Statistics, vol. 40, no. 4, pp. 2266–2292, 2012.
  • [34] E. Abbe, A. S. Bandeira, and G. Hall, “Exact recovery in the stochastic block model,” IEEE Transactions on Information Theory, vol. 62, no. 1, pp. 471–487, 2016.
  • [35] K. Rohe, S. Chatterjee, B. Yu, et al., “Spectral clustering and the high-dimensional stochastic blockmodel,” The Annals of Statistics, vol. 39, no. 4, pp. 1878–1915, 2011.
  • [36] M. Lelarge, L. Massoulié, and J. Xu, “Reconstruction in the labelled stochastic block model,” IEEE Transactions on Network Science and Engineering, vol. 2, no. 4, pp. 152–163, 2015.
  • [37] D. E. Fishkind, D. L. Sussman, M. Tang, J. T. Vogelstein, and C. E. Priebe, “Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown,” SIAM Journal on Matrix Analysis and Applications, vol. 34, no. 1, pp. 23–39, 2013.
  • [38] B. Karrer and M. E. Newman, “Stochastic blockmodels and community structure in networks,” Physical Review E, vol. 83, no. 1, p. 016107, 2011.
  • [39] B. Wilder, N. I. E. Rice, and M. Tambe, “Influence maximization with an unknown network by exploiting community structure,” 2017.
  • [40] V. Krishnamurthy and G. G. Yin, “Recursive algorithms for estimation of hidden Markov models and autoregressive models with Markov regime,” IEEE Transactions on Information Theory, vol. 48, no. 2, pp. 458–476, 2002.
  • [41] H. K. Khalil, “Noninear systems,” Prentice-Hall, New Jersey, vol. 2, no. 5, pp. 5–1, 1996.