This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Influence-aware Task Assignment in Spatial Crowdsourcing (Technical Report)

Xuanhao Chen1, Yan Zhao2, Kai Zheng1*, Bin Yang2, Christian S. Jensen2
1University of Electronic Science and Technology of China, China
2Department of Computer Science, Aalborg University, Denmark
xhc@std.uestc.edu.cn, yanz@cs.aau.dk, zhengkai@uestc.edu.cn, {byang, csj}@cs.aau.dk
*Corresponding author: Kai Zheng.
Abstract

With the widespread diffusion of smartphones, Spatial Crowdsourcing (SC), which aims to assign spatial tasks to mobile workers, has drawn increasing attention in both academia and industry. One of the major issues is how to best assign tasks to workers. Given a worker and a task, the worker will choose to accept the task based on her affinity towards the task, and the worker can propagate the information of the task to attract more workers to perform it. These factors can be measured as worker-task influence. Since workers’ affinities towards tasks are different and task issuers may ask workers who performed tasks to propagate the information of tasks to attract more workers to perform them, it is important to analyze worker-task influence when making assignments. We propose and solve a novel influence-aware task assignment problem in SC, where tasks are assigned to workers in a manner that achieves high worker-task influence. In particular, we aim to maximize the number of assigned tasks and worker-task influence. To solve the problem, we first determine workers’ affinities towards tasks by identifying workers’ historical task-performing patterns. Next, a Historical Acceptance approach is developed to measure workers’ willingness of performing a task, i.e., the probability of workers visiting the location of the task when they are informed. Next, we propose a Random reverse reachable-based Propagation Optimization algorithm that exploits reverse reachable sets to calculate the probability of workers being informed about tasks in a social network. Based on worker-task influence derived from the above three factors, we propose three influence-aware task assignment algorithms that aim to maximize the number of assigned tasks and worker-task influence. Extensive experiments on two real-world datasets offer detailed insight into the effectiveness of our solutions.

Index Terms:
worker-task influence, task assignment, spatial crowdsourcing

I Introduction

With the near-ubiquitous diffusion of smartphones and similar devices, a new kind of crowdsourcing has emerged, namely Spatial Crowdsourcing (SC), where smartphone users serve as workers that perform tasks at specific physical locations. In SC, examples of spatial tasks include reporting local hot spots, taking photos or videos of a POI, and monitoring traffic conditions [1].

SC has received substantial attention in the last years [2, 3, 4, 5, 6, 7, 8, 9, 10]. Studies exist that aim to maximize the total number of completed tasks [11], the diversity score of assignments [12], the number of completed tasks for a worker with an optimal schedule [13], etc. These studies generally focus on the spatio-temporal information of workers and tasks during task assignment, while they do not consider worker-task influence, i.e., how to ensure that assigned tasks satisfy workers’ affinities towards tasks and are well-known among workers who are likely to visit the locations of tasks. Visiting the location of a task is equivalent to accepting the task. In real-world scenarios, different workers prefer different kinds of tasks. Moreover, when completing tasks, workers can propagate information on available tasks to their friends through social networks. Workers who are informed can choose to perform tasks based on their historical task-performing patterns. It is important to analyze such phenomena when assigning tasks. For example, the owner of a new restaurant may want to publish a leaflet distribution task to promote the restaurant as widely as possible. Some free meal coupons and VIP cards are offered to workers who accept the task and help to propagate the news about the restaurant. If we only consider spatio-temporal information, an available worker who close to the restaurant at the current time will be assigned the task, but the worker may not be able to promote the restaurant widely. Thus, the case will not be a successful promotion. In addition, the real-time locations of workers are temporary, which ignores the worker’s historical task-performing patters. Moreover, by analyzing the social networks that workers are in, we can obtain valuable insights about interactions among workers, which can be further utilized to improve the quality of spatial task assignments.

Recent studies have explored the effects of social impact in task assignment, where social network features are used to extract preference of worker groups [14, 15]. However, these studies do not consider the interactions among workers, which include information propagation patterns and social network structures. Different workers have different abilities to propagate information [16] and different probabilities to visit the locations of tasks [17]. This indicates that different workers contribute differently to worker-task influence. Moreover, it is important to infer task execution behaviors based on historical task-performing records of workers. Several approaches use past task-performing patterns to deduce worker preferences for tasks [18, 1], but they do not analyze the willingness of workers to perform tasks, i.e., the probabilities that workers will visit the task locations. If a worker previously performed tasks near a new task, the worker is more likely to visit the location of the new task [19]. Lastly, we are not aware of any existing task assignment techniques that combine social networks and historical task-performing patterns to determine worker-task influence, which is a key factor for improving the quality of task assignment in SC.

To address these challenges, we propose the Influence-aware Task Assignment (ITA) problem, where the objective is to assign tasks to suitable workers so as to maximize both the total number of assigned tasks and worker-task influence, which consists of workers’ affinities (namely worker-task affinity) towards tasks, the probability (namely worker willingness) of workers visiting the locations of tasks and the probability (namely worker propagation) of workers being informed about tasks in social networks. Larger worker-task influence means that workers’ affinities towards tasks are larger, and the number of people who are willing to visit the locations of tasks after informed is larger. An example of the ITA problem is illustrated in Figure 1. Workers, w1w_{1}, w2w_{2}, and w3w_{3} performed tasks, s1s_{1}, s2s_{2}, and s3s_{3}, at time t1t_{1}, respectively. At time t2t_{2}, workers w4w_{4} and w5w_{5} are online, and tasks s4s_{4} and s5s_{5} become available. These are tasks published by new restaurants that ask workers to take photos and then advertise the restaurants on social media. The requirement of tasks is to increase the number of people who are willing to visit the location of the restaurant after knowing about it, i.e., enlarging worker-task influence. The circle around each worker denotes the reachable region of the worker at the current time. Because of the budgets of the restaurant, only one worker is required to perform each task, while the worker who are assigned the task should enlarge worker-task influence. A simple greedy approach is to assign tasks to the nearest worker, which gives the task assignment {(s4,w3),(s5,w5)}\{(s_{4},w_{3}),(s_{5},w_{5})\}, where the value of worker-task influence is 1.67+0.85=2.521.67+0.85=2.52. However, adopting an influence-aware task assignment approach, we can achieve a higher worker-task influence with assignment {(s4,w4),(s5,w5)}\{(s_{4},w_{4}),(s_{5},w_{5})\}, where the value of worker-task influence is 4.25+0.85=5.14.25+0.85=5.1.

Refer to caption

Figure 1: Running Example

Worker-task influence can be computed by worker-task affinity (i.e., workers’ affinities towards tasks), worker willingness (i.e., the probability of workers visiting the locations of tasks) and worker propagation (i.e, the probability of workers being informed about tasks in social networks.). However, there exists a challenge of how to combine worker-task influence with existing objectives such as maximizing the number of assigned tasks. In other words, influence-aware assignment should optimize for worker-task influence without sacrificing other objectives. To achieve this, we propose a Data-driven Influence-aware Task Assignment (DITA) framework, consisting of two primary components. First, worker-task influence is calculated, which not only considers online interactions among workers, but also captures workers’ historical task-performing patterns and real-time task assignments mode. Second, we design three algorithms to maximize the overall task assignments by giving higher priorities to workers who generate higher worker-task influence at every time instance.

The paper’s contributions can be summarized as follows:

i) We formalize and study an Influence-aware Task Assignment (ITA) problem in the context of SC. To the best of our knowledge, this is the first study in SC that considers worker-task influence and assigns tasks based on the influence.

ii) We calculate worker-task influence by taking into account worker-task affinity, worker willingness and worker propagation.

iii) We design three alternative algorithms to solve the ITA problem, including basic Influence-aware Assignment, Entropy-based Influence-aware Assignment, and Distance-based Influence-aware Assignment.

iv) We conduct extensive experiments on two real-world datasets to offer insight into the effectiveness of the proposed methods.

II PROBLEM STATEMENT

We present necessary preliminaries, define the problem addressed, and give an overview of our solution framework. Table I lists notation used throughout the paper.

II-A Preliminary Concepts

Definition 1 (Spatial Task).

A spatial task, denoted by s=(l,p,φ,C)s=(l,p,\varphi,C), has a location s.ls.l, a publication time s.ps.p, a valid time φ\varphi (meaning that it will expire at s.p+s.φs.p+s.\varphi), and multiple category labels s.Cs.C.

Definition 2 (Worker).

A worker, denoted by w=(l,r)w=(l,r), consists of a location w.lw.l and a reachable distance w.rw.r. The reachable range of worker ww is a circle with center w.lw.l and radius w.rw.r, within which ww can accept assignments.

A spatial task ss can be completed only if a worker arrives at its location before the expiration deadline s.p+s.φs.p+s.\varphi. With single-task assignment mode, the SC server assigns each task to one worker at a time.

Definition 3 (Worker-Task Influence).

Given a worker ww and a task ss. Worker-task influence (calculated in Section III), denoted as 𝑖𝑓(w,s)\mathit{if}(w,s), consists of ww’s affinity towards ss, the probability of other workers visiting the location of ss after informed by ww, and the probability of other workers who are informed by ww through social networks.

TABLE I: Summary of Notation
Symbol Definition
ss Spatial task
s.ls.l Location of spatial task ss
s.ps.p Publication time of spatial task ss
s.φs.\varphi Valid time of spatial task ss
s.Cs.C Categories of spatial task ss
SS A spatial task set
ww Worker
w.lw.l Location of worker ww
w.rw.r Reachable distance of worker ww
WW A worker set
𝑖𝑓(w,s)\mathit{if}(w,s) Worker-task influence of worker ww and spatial task ss
AA A task assignment
|A||A| The total number of assigned tasks in task assignment AA
AoptA_{opt} The optimal task assignment
𝔸\mathbb{A} A task assignment set
Definition 4 (Spatial Task Assignment).

Given a set of tasks SS and a set of workers WW, a spatial task assignment, denoted by AA, consists of a set of worker-task pairs of the form (s,w)(s,w), where task ss is assigned to worker ww satisfying the spatio-temporal constraints, and where each worker or task can be assigned at most once.

We use |A||A| to denote the total number of assigned tasks in task assignment AA. The problem investigated is stated as follows:

ITA Problem Statement. Given a set of workers and a set of tasks at the current time in an SC platform, our problem is to find a task assignment A𝑜𝑝𝑡\mathit{A_{\mathit{opt}}} that achieves the following goals:

1) primary optimization goal: maximize the total number of assigned tasks (i.e., \forall Ai𝔸\mathit{A_{i}\in\mathbb{A}} (|Ai||\mathit{A_{i}}|\leq|A𝑜𝑝𝑡||\mathit{\mathit{A_{opt}}}|)), where 𝔸\mathbb{A} denotes all possible assignments; and

2) secondary optimization goal: maximize worker-task influence of assignments.

Lemma 1.

The ITA problem is NP-hard.

Proof.

We can prove the lemma through a reduction from the 0-1 knapsack problem, which is described as follows: Given a set UU with nn items, in which each item uiu_{i} is labelled with a weight lil_{i} and a value hih_{i}, the 0-1 knapsack problem is to find a subset UU^{*} of UU that maximizes uiUhi\sum_{u_{i}\in U^{*}}h_{i} subjected to uiUliL\sum_{u_{i}\in U^{*}}l_{i}\leq L, where LL is the maximum weight capacity.

Consider the following instance of the ITA problem. Given a task set SS with nn tasks, each task siSs_{i}\in S is associated with a worker (corresponding to the weight li=1l_{i}=1 of the 0-1 knapsack problem). Here, the number of workers is sufficiently large. The value hih_{i} of each task sis_{i} that is a function related to task completion and worker-task influence, is at least as hard as the hih_{i} (that is a constant) in the 0-11 knapsack problem, so that this difference does not make our problem easier. In addition, we have LL workers. Therefore, the ITA problem is to identify a task subset SS^{*} of SS that maximizes siShi\sum_{s_{i}\in S}h_{i} subjected to siSliL\sum_{s_{i}\in S}l_{i}\leq L.

If the ITA problem instance can be solved in polynominal time, a 0-11 knapsack problem can be solved by being transformed to the corresponding ITA problem instance and then it can be solved in polynominal time. This contradicts the fact that the 0-11 knapsack problem is NP-hard [20], and so there cannot be an efficient solution (i.e., in polynominal time) to the ITA problem instance that is then NP-hard. Since the ITA problem instance is NP-hard, the ITA problem is also NP-hard.

II-B Framework Overview

We propose a framework, Data-driven Influence-aware Task Assignment (DITA), to solve the ITA problem. The framework has two components: worker-task influence modeling and task assignment, as shown in Figure 2.

The first component aims to calculate worker-task influence. Specifically, we employ Latent Dirichlet Allocation (LDA) to measure workers’ affinities (i.e., worker-task affinity) towards tasks, where we treat the categories of tasks that workers have already completed as documents to train the LDA model, and then the categories of tasks and workers at the current time are input into the trained LDA model to compute the worker-task affinity. For worker willingness calculation, we propose a Historical Acceptance (HA) algorithm to measure the probability of a worker visiting the location of a task based on the task-performing history of the worker and the real-time locations of the worker and task. For worker propagation, we first exploit an Independent Cascade (IC) model to simulate information propagation process of tasks in a given social network, and then we propose a Random reverse reachable-based Propagation Optimization (RPO) algorithm to calculate worker propagation based on IC and social networks.

In the task assignment component, considering the spatio-temporal constraints (i.e., the reachable regions of workers and expiration times of tasks) of workers and tasks, we optimize the task assignment based on worker-task influence at each time instance and propose a basic Influence-aware Assignment (IA) method. Taking worker-task influence and location entropy into account, we propose an Entropy-based Influence-aware Assignment (EIA) method. Moreover, a Distance-based Influence-aware Assignment (DIA) method which considers worker-task influence and workers’ travel costs is developed.

Refer to caption

Figure 2: DITA Framework

III Worker-task Influence Calculation

We proceed to detail how to calculate worker-task influence for a worker and a task. We cover worker-task affinity, worker willingness, worker propagation and worker-task influence.

III-A Worker-Task Affinity Calculation

In SC, different workers exhibit different affinities (i.e., preferences) for the same categories of tasks, leading to different task-performing behaviors. For example, a worker may like to report a hot spot, while another worker may prefer to monitor traffic conditions. Since task categories contain semantic information (e.g., restaurant) and the Latent Dirichlet Allocation (LDA) model [21] performs well at modeling semantic affinity (i.e., semantic matching) between text documents by learning topics, we employ it to quantify worker-task affinity.

In LDA, a document is regarded as a set of words generated by several topics, where each topic is described by terms following a probability distribution. The modeling process can be formalized as follows:

P(vi|d)=j=1|𝑇𝑜𝑝|P(vi|tj)P(tj|d)\footnotesize P(v_{i}|d)=\sum_{j=1}^{|\mathit{Top}|}P(v_{i}|t_{j})P(t_{j}|d)

Here P(vi|d)P(v_{i}|d) is the probability of term viv_{i} for a document dd and |𝑇𝑜𝑝||\mathit{Top}| is the number of topics. Next, P(vi|tj)P(v_{i}|t_{j}) is the probability of viv_{i} within topic tjt_{j}, and P(tj|d)P(t_{j}|d) is the probability of picking a term from tjt_{j} in document dd. LDA estimates the topic-term distribution, P(vi|tj)P(v_{i}|t_{j}), and the document-topic distribution, P(tj|d)P(t_{j}|d), using Dirichlet priors. It iterates multiple times over each term in dd until the parameters in LDA converge. This way, we get the topic distribution of each document. Each topic is a probability distribution over a set of words. In the LDA model, words that are related semantically have high probability of belonging to the same topic.

In order to adapt LDA to model worker-task affinity, we treat each task category as a word, and we treat the categories of tasks in the historical task-performing records of worker wiw_{i} as a document, denoted by 𝑑𝑐wi\mathit{dc}_{w_{i}}. The documents across all workers on an SC platform form a set of documents that is used to train the LDA model, cf. Figure 3. Based on the documents, LDA can learn topics. Each topic is represented by a probability distribution over categories. For worker wiw_{i} and task sis_{i} at the current time, we can use the trained LDA model to calculate the topic distribution, where the topic distribution of wiw_{i} is calculated from the historical task-performing records that reflect the preferred category distribution of wiw_{i}, and the topic distribution of sis_{i} is calculated based on its categories. Then the learned topics and the document 𝑑𝑐si\mathit{dc}_{s_{i}} formed by the categories of the location of sis_{i} are used to estimate the worker-task affinity, P𝑎𝑓𝑓P_{\mathit{aff}}, as follows:

P𝑎𝑓𝑓(wi,si)=t𝑇𝑜𝑝P(wi|t)P(si|t),\footnotesize P_{\mathit{aff}}(w_{i},s_{i})=\sum\nolimits_{t\in\mathit{Top}}P(w_{i}|t)\cdot P(s_{i}|t),

where tt denotes a topic and 𝑇𝑜𝑝\mathit{Top} is a set of learned topics. Further, P(wi|t)P(w_{i}|t) and P(si|t)P(s_{i}|t) quantify how well topic tt matches the topic distribution of wiw_{i}’s historical task-performing records and the topic distribution of task sis_{i}, respectively. A larger P𝑎𝑓𝑓(wi,si)P_{\mathit{aff}}(w_{i},s_{i}) value indicates that wiw_{i} is more likely to perform sis_{i}, since the preferred category distribution of wiw_{i} and that of the task sis_{i} are correlated better.

Refer to caption


Figure 3: Worker-task Affinity Calculation

III-B Worker Willingness Calculation

In general, different workers exhibit different willingness to visit the location of a task. Previous studies only consider real-time locations of workers and tasks [22, 23] when assigning tasks. However, measuring a worker’s willingness to visit the location of a task according to the distance between the worker’s real-time location and the task location represents an incomplete picture. The real-time location is temporary, and this ignores the worker’s historical task-performing patters.

To tackle this issue, we propose a Historical Acceptance (HA) approach to measure the willingness of a worker ww to visit the locations of particular tasks based on the worker’s historical task-performing records (denoted as SwS_{w}) and the real-time locations of workers and tasks, where Sw={(s1,ts1a,ts1l),(s2,ts2a,ts2l),,(sn,tsna,tsnl)}S_{w}=\{(s_{1},t^{a}_{s_{1}},t^{l}_{s_{1}}),(s_{2},t^{a}_{s_{2}},t^{l}_{s_{2}}),\ldots,(s_{n},t^{a}_{s_{n}},t^{l}_{s_{n}})\} and each triplet (si,tsia,tsil)(s_{i},t^{a}_{s_{i}},t^{l}_{s_{i}}) consists of task sis_{i}, a task arrival time tsiat^{a}_{s_{i}}, and a task completion time tsilt^{l}_{s_{i}}. The worker willingness is measured as the probability that a worker moves from the locations of the tasks they have performed to the location of the current task.

In particular, HA computes worker willingness in terms of stationary distribution modeling of workers’ historical mobility and movement probability density calculation.

III-B1 Stationary Distribution Modeling of Workers’ Historical Mobility

The stationary distribution of a worker’s historical mobility captures the probability that a worker ww stays at the location of a performed task sis_{i} denoted as Pw(w,si)P_{w}(w,s_{i}). This probability can be computed using the Random Walk with Restart (RWR) method, which is an efficient approach to simulating the movement of objects [24]. In order to adapt the RWR method to compute the stationary distribution of a worker’s historical mobility, we exploit workers’ historical task-performing records SwS_{w} (ordered by check-in time) to construct an n×nn\times n weight matrix for worker ww (wWw\in W), where nn is the number of tasks performed by ww. The weight of item w𝑖𝑗w^{\prime}_{\mathit{ij}} in the ii-th row and jj-th column is set to 1/jnm𝑖𝑗1/\sum_{j}^{n}m_{\mathit{ij}}, where m𝑖𝑗=1m_{\mathit{ij}}=1 if ww performed tasks at the jj-th location; otherwise, m𝑖𝑗=0m_{\mathit{ij}}=0.

III-B2 Movement Probability Density Calculation

The movement probability density of worker ww is the probability density of moving from the location of task sis_{i} to the location of the next task, si+1s_{i+1}, denoted as fw(d(si,si+1))f_{w}(d(s_{i},s_{i+1})), where fwf_{w} is a probability density function and d(si,si+1)d(s_{i},s_{i+1}) is the distance between the location of sis_{i} and the location of si+1s_{i+1}. Previous studies [25, 26] show that the movements of workers are self-similar. Since the random variable described by the Pareto distribution obeys the self-similarity property [27], we choose the Pareto distribution to measure the movement probability density of worker ww, denoted as fw(x;π;ω)=πωπxπ+1f_{w}(x;\pi;\omega)=\frac{\pi\omega^{\pi}}{x^{\pi+1}}, where xx is the distance between locations of tasks, π\pi is a shape parameter that can be calculated using maximum likelihood estimation, and ω\omega is the minimum value of xx. Since real-time tasks are not known in advance, we use SwS_{w} (ordered by check-in time) to compute fwf_{w}. Note that different task-performing orders will lead to different fwf_{w}. As a worker may perform several tasks at the same location, the minimum value of d(si,si+1)d(s_{i},s_{i+1}) is 0, where sis_{i} and si+1s_{i+1} are tasks in SwS_{w}. We set xix_{i} of fwf_{w} to d(si,si+1)+1d(s_{i},s_{i+1})+1 to avoid xix_{i} being 0. In this case, ω=1\omega=1. Based on xix_{i}, we employ maximum likelihood estimation to estimate π\pi of fwf_{w}. Further, π\pi can be calculated using the following equation:

ddπi|Sw|1πxiπ+1=0\footnotesize\frac{d}{d\pi}\prod\limits_{i}^{|S_{w}|-1}\frac{\pi}{x_{i}^{\pi+1}}=0

where |Sw||S_{w}| is the number of performed tasks of ww and xi1x_{i}\geq 1.

Accordingly, π\pi is given by Equation 1.

π=|Sw|1i|Sw|1lnxi, where i|Sw|1lnxi0\displaystyle\begin{array}[]{c}{\pi=\frac{|S_{w}|-1}{\sum_{i}^{|S_{w}|-1}\ln x_{i}},\text{ where }\sum\limits_{i}^{|S_{w}|-1}\ln x_{i}\neq 0}\end{array} (1)

Since a worker may stay at different locations of performed tasks, we first need to compute the probability of worker ww staying at the location of a performed task sis_{i}, and then we combine the probability with the probability that ww moves from the location of sis_{i} to the location of the current task ss to compute worker willingness. Based on Sections III-B1 and III-B2, the willingness of ww to visit the location of ss can be calculated as follows:

P𝑤𝑖𝑙(w,s)\displaystyle P_{\mathit{wil}}(w,s) =siSwPw(w,si)d(si,s)fw(x)𝑑x\displaystyle=\sum\limits_{s_{i}}^{S_{w}}P_{w}(w,s_{i})\cdot\int_{d(s_{i},s)}^{\infty}f_{w}(x)dx (2)
=siSwPw(w,si)(d(si,s)+1)π\displaystyle=\sum\limits_{s_{i}}^{S_{w}}P_{w}(w,s_{i})\cdot(d(s_{i},s)+1)^{-\pi}

III-C Worker Propagation Calculation

When knowing information of a task, a worker has the potential to propagate the task’s information to other workers independently through social media. We propose worker propagation to measure the probability of workers being known tasks.

There are two main challenges when computing worker propagation. First, since the information propagation in a social network is complex, it is important to simulate the propagation process reasonably. Second, based on the propagation process, the computation of worker propagation should be completed in limited time since we aim to assign tasks to workers online. To address these challenges, we propose an approximation method, called Random reverse reachable-based Propagation Optimization (RPO), for calculating the worker propagation.

III-C1 Random Reverse Reachable Set Generation

We first detail how to generate Random Reverse Reachable (RRR) sets for workers, which will be used in the RPO method. The definition of an RRR set follows.

Definition 5 (Random Reverse Reachable Set).

Given a social network G=(W,E)G=(W,E), constructing the reverse graph GG^{\prime} of GG and selecting a worker wiw_{i} uniformly at random from GG^{\prime}, a subgraph gig_{i} is a directed graph sampled from GG^{\prime} under a given propagation model. A random reverse reachable (RRR) set for wiw_{i} is a set of workers in gig_{i} that can reach wiw_{i}.

To generate an RRR set for each worker, it is important to select a suitable propagation model. Independent Cascade (IC) [28, 29, 30, 31, 32, 17, 33] is a commonly-used propagation model, where users inform their neighbors independently. In our ITA problem, a worker knowing a task has the potential to propagate information about the task to the neighbors independently, which can be well modeled by IC. Therefore, we use the IC model to simulate the information propagation process of tasks and sample subgraphs from GG^{\prime} to generate RRR sets.

The IC model is an iterative model. At the beginning of IC model, a worker wsw_{s} who knows task ss is selected to inform the neighbors independently. In each iteration, if a worker wiw_{i} has more than one neighbor knowing the task information in the current iteration, the worker will be informed by these neighbors independently. The probability, Pk(wi)P_{\mathit{k}}(w_{i}), of worker wiw_{i} being informed by the neighbors in the k\mathit{k}-th iteration is calculated as follows:

Pk(wi)=1wj𝑁𝐸k1(wi)(1Pj(wj,wi)),\footnotesize P_{\mathit{k}}(w_{i})=1-\prod_{w_{j}\in\mathit{NE}_{k-1}(w_{i})}(1-P_{j}(w_{j},w_{i})),

where 𝑁𝐸k1(wi)\mathit{NE}_{k-1}(w_{i}) is the neighbors of wiw_{i} who know a given task in the (k1)(k-1)-th iteration, and Pj(wj,wi)P_{j}(w_{j},w_{i}) is an in-degree-based probability that neighbor wjw_{j} of wiw_{i} informs wiw_{i}, which is a ratio between 11 and wiw_{i}’s in-degree.

Workers who are informed have only one chance to inform their neighbors. In the kk-th iteration of the IC model, a worker who is not informed is added to a subgraph gsg_{s} with the probability PkP_{k}, and the edges connecting the worker and the neighbors who are informed in the (k1)(k-1)-th iteration are added to gsg_{s} with the probability PjP_{j}. When no new workers are informed, the propagation process terminates, and gsg_{s} is constructed. Accordingly, the RRR set of worker wsw_{s} is a set of workers that can reach wsw_{s} along a finite number of edges in the directed graph gsg_{s}.

III-C2 Random Reverse Reachable-based Propagation Optimization Method

Next, we present the Random reverse reachable-based Propagation Optimization (RPO) method.

Based on Definition 5, if a worker wsw_{s} appears in an RRR set of another worker wiw_{i}, the propagation process from wsw_{s} should have a certain probability to inform wiw_{i}. Specifically, we can get following lemma.

Lemma 2.

Given two workers wsw_{s}, and wiw_{i}, the probability that wiw_{i} is informed by wsw_{s} under a propagation process equals the probability that wsw_{s} belongs to an RRR set of wiw_{i} [30].

Given a set of NN RRR sets, ={R1,R2,,RN}\mathbb{R}=\{R_{1},R_{2},\ldots,R_{N}\}, and a worker wsw_{s} in GG, let i\mathbb{R}_{i} (i\mathbb{R}_{i}\subseteq\mathbb{R}) be a set of RRR sets that are generated by worker wiw_{i} in GG. Based on Lemma 2 and the linearity of expectation, we can calculate the informed probability P𝑝𝑟𝑜(ws,wi)P_{\mathit{pro}}(w_{s},w_{i}) of wiw_{i} being informed by wsw_{s}, as follows:

P𝑝𝑟𝑜(ws,wi)=|W|N𝔼[j=1|i|vj],\footnotesize P_{\mathit{pro}}(w_{s},w_{i})=\frac{|W|}{N}\cdot\mathbb{E}\left[\sum\nolimits_{j=1}^{|\mathbb{R}_{i}|}v_{j}\right], (3)

where vj=0v_{j}=0 if {ws}Rj=\{w_{s}\}\cap R_{j}=\emptyset; otherwise, vj=1v_{j}=1. RjR_{j} is the jj-th set of i\mathbb{R}_{i}, |W||W| is the number of workers in GG, |i||\mathbb{R}_{i}| is the size of i\mathbb{R}_{i}, and 𝔼[j|i|vj]\mathbb{E}\left[\sum_{j}^{|\mathbb{R}_{i}|}v_{j}\right] is the expected number of RRR sets generated by wiw_{i} that cover wsw_{s}. To ensure that the estimation of P𝑝𝑟𝑜(ws,wi)P_{\mathit{pro}}(w_{s},w_{i}) is accurate, it is essential that NN is sufficiently large to ensure that j|i|vj\sum_{j}^{|\mathbb{R}_{i}|}v_{j} not deviate significantly from its expectation. The analysis of how to choose a setting for NN is covered in Section III-E.

The whole process of the RPO method is covered in Algorithm 1. The main computational challenge is the huge search space when enumerating all possible RRR sets of each worker, which increases exponentially with respect to the number of workers. Therefore, it is important to obtain a limited number of RRR sets that make it possible to guarantee an approximation ratio of the probability that workers are informed. To achieve this, we propose two lower bounds on the number of RRR sets (an iteration-based lower bound NR(k)N_{\mathit{R}}(k) and a threshold-based lower bound NR(γ)N_{R}^{\prime}(\gamma)). Specifically, given a worker wsw_{s} who knows task ss and a social network G=(W,E)G=(W,E) as input, Algorithm 1 iteratively generates NR(k)N_{\mathit{R}}(k) RRR sets (stored in \mathbb{R}) based on G\mathit{G} (lines 11), where NR(k)N_{\mathit{R}}(k) is the iteration-based lower bound on the number of RRR sets. Then for each worker wiWw_{i}\in W, the algorithm computes the number Np(wi)N_{p}(w_{i}) of workers which wiw_{i} can propagate the task information to based on the current \mathbb{R} and then finds the maximal Np(wi)N_{p}(w_{i}), denoted as Npopt=maxwiWNp(wi)N_{p}^{opt}=\max_{w_{i}\in W}{N_{p}(w_{i})} (lines 11). If NpoptN_{p}^{opt} is larger than a threshold γ\gamma, a threshold-based lower bound NR(γ)N_{R}^{\prime}(\gamma) on the number of RRR sets is computed based on the threshold γ\gamma and NpoptN_{p}^{opt} (lines 11); otherwise, \mathbb{R} is set to \emptyset (lines 11). Next, the algorithm continues to generate (NR(γ)||)(N_{R}^{\prime}(\gamma)-|\mathbb{R}|) RRR sets when the size of the current \mathbb{R} is too small, i.e., ||<NR(γ)|\mathbb{R}|<N_{R}^{\prime}(\gamma) that is computed in the iteration (lines 11). Getting a set \mathbb{R} of suitable RRR sets, we can compute P𝑝𝑟𝑜(ws,wi)P_{\mathit{pro}}(w_{s},w_{i}) (wiWw_{i}\in W) (according to Equation 3) and output the worker propagation for a worker, i.e., 𝑊𝑃ws(P𝑝𝑟𝑜(ws,w1),,P𝑝𝑟𝑜(ws,w|W|))\mathit{WP}_{w_{s}}\leftarrow(P_{\mathit{pro}}(w_{s},w_{1}),\ldots,P_{\mathit{pro}}(w_{s},w_{|W|})) (lines 11). The computation of k\mathit{k}, NR(k)N_{\mathit{R}}(k), Np(wi)N_{p}(w_{i}), NpoptN_{p}^{opt}, NR(γ)N_{R}^{\prime}(\gamma), γ\gamma and the approximation ratio are discussed in Section III-E.

Input: a worker wsw_{s} who knows task ss; a social network G=(W,E)G=(W,E)
Output: worker propagation 𝑊𝑃ws\mathit{WP}_{w_{s}} of wsw_{s}
1
2\mathbb{R}\leftarrow\emptyset;
3 k|W|/2k\leftarrow|W|/2;
4 repeat
5 
6 Compute NR(k)N_{\mathit{R}}(k);
  // NR(k)N_{\mathit{R}}(k) is the iteration-based lower bound of the number of RRR sets.
7  Generate NR(k)N_{\mathit{R}}(k) RRR sets based on G\mathit{G} (according to Section III-C1);
8  Insert these RRR sets into \mathbb{R};
9  Compute Np(wi)N_{p}(w_{i}) (wiWw_{i}\in W) based on \mathbb{R};
  // Np(wi)N_{p}(w_{i}) denotes the number of workers which wiw_{i} can propagate the task information to.
10  Find the maximal Np(wi)N_{p}(w_{i}), denoted as NpoptN_{p}^{opt};
11  if NpoptγN_{p}^{opt}\geq\gamma then
12     Compute NR(γ)N_{R}^{\prime}(\gamma) based on NpoptN_{p}^{opt};
     // NR(γ)N_{R}^{\prime}(\gamma) is the threshold-based lower bound of the number of RRR sets.
13     break;
14    
15 else
16     \mathbb{R}\leftarrow\emptyset;
17     kk/2k\leftarrow k/2;
18    
19 
20until k=2k=2;
21if ||<NR(γ)|\mathbb{R}|<N_{R}^{\prime}(\gamma) then
22  Generate (NR(γ)||)(N_{R}^{\prime}(\gamma)-|\mathbb{R}|) RRR sets and insert them into \mathbb{R};
23 
24for each wiW{ws}w_{i}\in W\setminus\{w_{s}\} do
25  Compute P𝑝𝑟𝑜(ws,wi)P_{\mathit{pro}}(w_{s},w_{i}) based on \mathbb{R} (according to Equation 3);
26 
27𝑊𝑃ws(P𝑝𝑟𝑜(ws,w1),,P𝑝𝑟𝑜(ws,w|W|))\mathit{WP}_{w_{s}}\leftarrow(P_{\mathit{pro}}(w_{s},w_{1}),\ldots,P_{\mathit{pro}}(w_{s},w_{|W|}));
Return 𝑊𝑃s\mathit{WP}_{s}
Algorithm 1 RPO

The complexity of RPO is dominated by the generation of RRR sets, which takes O(|E|+|||M||E|/|W|)O(|E|+|\mathbb{R}|\cdot|M|\cdot|E|/|W|) time, where |E||E| is the number of edges in GG, |||\mathbb{R}| is the size of \mathbb{R}, |M||M| is the number of workers who can be informed by the greedy informed worker (see Definition 8), and |W||W| is the number of workers in GG.

III-D Worker-Task Influence Calculation

We combine worker-task affinity P𝑎𝑓𝑓P_{\mathit{aff}}, worker willingness P𝑤𝑖𝑙P_{\mathit{wil}}, and worker propagation WPwsWP_{w_{s}} to calculate the worker-task influence, 𝑖𝑓(ws,s)\mathit{if}(w_{s},s), of worker wsw_{s} and task ss, as follows:

𝑖𝑓(ws,s)=P𝑎𝑓𝑓(ws,s)wiW{ws}P𝑤𝑖𝑙(wi,s)P𝑝𝑟𝑜(ws,wi),\footnotesize\mathit{if}(w_{s},s)=P_{\mathit{aff}}(w_{s},s)\cdot\sum_{w_{i}\in W\setminus\{w_{s}\}}P_{\mathit{wil}}(w_{i},s)\cdot P_{\mathit{pro}}(w_{s},w_{i}),

where wsw_{s} is a worker who knows the information of ss, WW denotes the worker set in the social network, and P𝑝𝑟𝑜(ws,wi)P_{\mathit{pro}}(w_{s},w_{i}) is the ii-th value in WPwsWP_{w_{s}}.

III-E Feasibility Analysis

In order to guarantee an approximation ratio of computing worker propagation based on Random Reverse Reachable (RRR) sets, we present a feasibility analysis of computing a suitable number NN of RRR sets. First, we introduce the notions of informed range and martingale and then use these to propose lemmas that facilitate the computation of the number of RRR sets. Then corresponding proofs are provided to guarantee a high approximation ratio between the estimated worker propagation and the worker propagation computed based on RRR sets. The notions of informed range and martingale [34] are defined as follows:

Definition 6 (Informed Range).

Given a worker wsw_{s} and an RRR set ={R1,R2,,RN}\mathbb{R}=\{R_{1},R_{2},\ldots,R_{N}\}, the informed range σ(ws)\sigma(w_{s}) of wsw_{s} is the estimated fraction of workers that are informed by wsw_{s}.

σ(ws)=i=1|W|P𝑝𝑟𝑜(ws,wi)=|W|N𝔼[j=1Nvj],\footnotesize\sigma(w_{s})=\sum\nolimits_{i=1}^{|W|}P_{\mathit{pro}}(w_{s},w_{i})=\frac{|W|}{N}\cdot\mathbb{E}\left[\sum\nolimits_{j=1}^{N}v_{j}\right],

where vj=0v_{j}=0 if {ws}Rj=\{w_{s}\}\cap R_{j}=\emptyset; otherwise, vj=1v_{j}=1.

Definition 7 (Martingale).

A sequence of random variables x1,x2,x_{1},x_{2},\ldots is a martingale if and only if 𝔼[|xi|]<+\mathbb{E}[|x_{i}|]<+\infty and 𝔼[xi|x1,x2,,xi1]=xi1\mathbb{E}[x_{i}|x_{1},x_{2},\ldots,x_{i-1}]=x_{i-1} for any i.

An important property of martingales [34] is shown as follows:

Lemma 3.

Given a martingale x1,x2,x_{1},x_{2},\ldots such that |x1|l1|x_{1}|\leq l_{1} and |xjxj1|l1|x_{j}-x_{j-1}|\leq l_{1} for j[2,i]j\in[2,i], and 𝑉𝑎𝑟[x1]+j=2i𝑉𝑎𝑟[xj|x1,x2,,xj1]l2\mathit{Var}[x_{1}]+\sum_{j=2}^{i}\mathit{Var}[x_{j}|x_{1},x_{2},\ldots,x_{j-1}]\leq l_{2}, for any ϵ>0\epsilon>0,

Pr[xi𝔼[xi]ϵ]exp(ϵ223l1ϵ+2l2),\footnotesize Pr\left[x_{i}-\mathbb{E}[x_{i}]\geq\epsilon\right]\leq\exp\left(-\frac{\epsilon^{2}}{\frac{2}{3}l_{1}\epsilon+2l_{2}}\right),

where 𝑉𝑎𝑟[]\mathit{Var}[\cdot] is the variance of a random variable.

Given a worker wsw_{s}, since each worker wiw_{i} is selected uniformly at random to generate RiR_{i} and the generation of RiR_{i} is independent of R1,R2,,Ri1R_{1},R_{2},\ldots,R_{i-1}, we have 𝔼[vi|v1,v2,,vi1]=𝔼[vi]=σ(ws)/|W|\mathbb{E}[v_{i}|v_{1},v_{2},\ldots,v_{i-1}]=\mathbb{E}[v_{i}]=\sigma(w_{s})/|W|. Let α=σ(ws)/|W|\alpha=\sigma(w_{s})/|W| and xi=j=1i(vjα)x_{i}=\sum_{j=1}^{i}(v_{j}-\alpha). It is clear that 𝔼[xi]=0\mathbb{E}[x_{i}]=0 and 𝔼[xi|x1,x2,,xi1]=xi1\mathbb{E}[x_{i}|x_{1},x_{2},\ldots,x_{i-1}]=x_{i-1}, which indicates that x1,x2,,xNx_{1},x_{2},\ldots,x_{N} is a martingale. Moreover, based on xi=j=1i(vjα)x_{i}=\sum_{j=1}^{i}(v_{j}-\alpha), it is clear that |x1|1|x_{1}|\leq 1 and |xixi1|1|x_{i}-x_{i-1}|\leq 1 for any i[2,N]i\in[2,N]. Combining this with the independence of RiR_{i}, we have:

𝑉𝑎𝑟[x1]+\displaystyle\mathit{Var}[x_{1}]+ i=2N𝑉𝑎𝑟[xi|x1,x2,,xi1]=Nα(1α)Nα\displaystyle\sum\nolimits_{i=2}^{N}\mathit{Var}[x_{i}|x_{1},x_{2},\ldots,x_{i-1}]=N\alpha(1-\alpha)\leq N\alpha (4)

Based on Lemma 3 and Equation 4, the following corollary is derived.

Corollary 1.

For any ϵ>0\epsilon>0,

Pr[jNvjNαϵNα]exp(ϵ22+23ϵNα)\footnotesize Pr\left[\sum\nolimits_{j}^{N}v_{j}-N\cdot\alpha\geq\epsilon\cdot N\cdot\alpha\right]\leq\exp\left(-\frac{\epsilon^{2}}{2+\frac{2}{3}\epsilon}\cdot N\cdot\alpha\right)

We obtain the corollary by applying Lemma 3 to x1,x2,,xN-x_{1},-x_{2},\ldots,-x_{N}.

Corollary 2.

For any ϵ>0\epsilon>0,

Pr[jNvjNαϵNα]exp(ϵ22Nα)\footnotesize Pr\left[\sum\nolimits_{j}^{N}v_{j}-N\cdot\alpha\leq-\epsilon\cdot N\cdot\alpha\right]\leq\exp\left(-\frac{\epsilon^{2}}{2}\cdot N\cdot\alpha\right)

Given a set ={R1,R2,,RN}\mathbb{R}=\{R_{1},R_{2},\ldots,R_{N}\} of RRR sets, let fR(ws)f_{R}(w_{s}) be the fraction of RRR sets in \mathbb{R} that cover wsw_{s}. It is clear that the number Np(ws)N_{p}(w_{s}) of workers who can be informed by wsw_{s} is |W|fR(ws)|W|\cdot f_{R}(w_{s}). Based on Corollary 2, we have following lemma.

Lemma 4.

Let λ(0,1)\lambda\in(0,1), ϵ>0\epsilon>0, and

N=2|W|ln(1/λ)σ(ws)ϵ2\footnotesize N^{{}^{\prime}}=\frac{2|W|\cdot\ln(1/\lambda)}{\sigma(w_{s})\cdot\epsilon^{2}} (5)

If NNN^{{}^{\prime}}\leq N, Np(ws)(1ϵ)σ(ws)N_{p}(w_{s})\geq(1-\epsilon)\cdot\sigma(w_{s}) holds with at least probability 1λ1-\lambda.

Lemma 4 shows that when NN^{\prime} is sizable, the calculation of worker propagation based on Equation 5 guarantees a (1ϵ)(1-\epsilon)-approximate solution. However, the value of σ\sigma is different for different workers, which makes it hard to derive a suitable value of NN^{\prime}. Moreover, NN^{\prime} should be as smaller as possible to reduce the computation time. To address this issue, we derive a lower bound on NN^{{}^{\prime}}. Let wsτw_{s}^{\tau} be a worker with maximum informed range, i.e., σ(wsτ)σ(ws)\sigma(w_{s}^{\tau})\geq\sigma(w_{s}), for any wsw_{s} in GG. We can rewrite Lemma 4 as follows:

Lemma 5.

Let λ(0,1)\lambda\in(0,1), ϵ>0\epsilon>0, and

NR(γ)=2|W|ln(1/λ)σ(wsτ)ϵ2\footnotesize N_{R}^{\prime}(\gamma)=\frac{2|W|\cdot\ln(1/\lambda)}{\sigma(w_{s}^{\tau})\cdot\epsilon^{2}}

If NR(γ)NN_{R}^{\prime}(\gamma)\leq N, Np(wsτ)(1ϵ)σ(wsτ)N_{p}(w_{s}^{\tau})\geq(1-\epsilon)\cdot\sigma(w_{s}^{\tau}) holds with at least probability 1λ1-\lambda.

The γ\gamma in Lemma 5 is a threshold, to be computed in Lemma 6. Since we aim to guarantee an approximation that is as high as possible, the setting of λ\lambda should be as low as possible (e.g., λ=1/|W|o\lambda=1/|W|^{o}, where o1o\geq 1).

However, in real cases, σ(wsτ)\sigma(w_{s}^{\tau}) is unknown in advance. To address this problem, we derive a lower bound on σ(wsτ)\sigma(w_{s}^{\tau}) with the help of a so-called greedy informed worker that can be calculated in advance. A greedy informed worker is defined as follows:

Definition 8 (Greedy Informed Worker).

A greedy informed worker wsθw_{s}^{\theta} is a worker generated by a greedy approach that maximizes fRf_{R}: wsθ=argmaxwiWfR(wi)w_{s}^{\theta}=\arg\max\limits_{w_{i}\in W}f_{R}(w_{i}).

In order to use wsθw_{s}^{\theta} to derive the lower bound on σ(wsτ)\sigma(w_{s}^{\tau}), we construct a test T()T(\cdot) related to wsθw_{s}^{\theta} on a set of values (denoted as K={k1,k2,}K=\{k_{1},k_{2},\ldots\}) and run the test on KK. If ki>σ(wsτ)k_{i}>\sigma(w_{s}^{\tau}) then T(ki)=𝑓𝑎𝑙𝑠𝑒T(k_{i})=\mathit{false} with a high probability, and kik_{i} can be considered as a lower bound of σ(wiτ)\sigma(w_{i}^{\tau}). Let Np𝑜𝑝𝑡=|W|fR(wsθ)N_{p}^{\mathit{opt}}=|W|\cdot f_{R}(w_{s}^{\theta}). Based on Corollary 1, we can construct T()T(\cdot) based on following lemma:

Lemma 6.

Given ϵ>0\epsilon^{*}>0, let kiKk_{i}\in K, γ=(1+ϵ)ki\gamma=(1+\epsilon^{*})\cdot k_{i}, λ(0,1)\lambda^{*}\in(0,1), and

NNR(ki)=(2+23ϵ)(ln(|W|)+ln(1/λ))|W|ϵ2ki\footnotesize N\geq N_{R}(k_{i})=\frac{\left(2+\frac{2}{3}\epsilon^{*}\right)\cdot\left(\ln(|W|)+\ln(1/\lambda^{*})\right)\cdot|W|}{{\epsilon^{*}}^{2}\cdot k_{i}}

If σ(wsτ)<ki\sigma(w_{s}^{\tau})<k_{i}, we get Np𝑜𝑝𝑡<γN_{p}^{\mathit{opt}}<\gamma with at least probability 1λ1-\lambda^{*}.

Based on Lemma 6, it is easy to see that if Np𝑜𝑝𝑡γN_{p}^{\mathit{opt}}\geq\gamma, then σ(wsτ)ki\sigma(w_{s}^{\tau})\geq k_{i} holds at least with probability 1λ1-\lambda^{*}, and σ(wsτ)\sigma(w_{s}^{\tau}) can be set to Np𝑜𝑝𝑡ki/γN_{p}^{\mathit{opt}}\cdot k_{i}/\gamma. We can set K={|W|/2,|W|/4,|W|/8,,2}K=\{|W|/2,|W|/4,|W|/8,\ldots,2\} and then run the test T(ki):Np𝑜𝑝𝑡γT(k_{i}):N_{p}^{\mathit{opt}}\geq\gamma on O(log2|W|)O(\log_{2}|W|) values of KK to compute the lower of σ(wsτ)\sigma(w_{s}^{\tau}). Moreover, since we need to guarantee the approximation with high probability (e.g., 11/|W|o1-1/|W|^{o}), the setting of λ\lambda^{*} can be 1|W|olog2|W|\frac{1}{|W|^{o}\cdot\log_{2}|W|}.

Based on Lemmas 5 and 6, N\mathit{N} should be set to max{NR(γ),NR(ki)}\max\{N_{R}^{\prime}(\gamma),N_{R}(k_{i})\} to guarantee the approximation. However, it is difficult to choose suitable settings for ϵ\epsilon and ϵ\epsilon^{*} that minimize max{NR(γ),NR(ki)}\max\{N_{R}^{\prime}(\gamma),N_{R}(k_{i})\}. To address that problem, max{NR(γ),NR(ki)}\max\{N_{R}^{\prime}(\gamma),N_{R}(k_{i})\} can be approximated with a simple function of ϵ\epsilon and ϵ\epsilon^{*}, and then ϵ=2ϵ\epsilon^{*}=\sqrt{2}\epsilon is derived as the minimizer of the function.

The proofs of above lemmas are shown as follows:

Proof of Lemma 4. Given any worker wsw_{s}, let α=𝔼[fR(ws)]\alpha=\mathbb{E}[f_{R}(w_{s})], by Lemma 2, α=𝔼[fR(ws)]=σ(ws)/|W|\alpha=\mathbb{E}[f_{R}(w_{s})]=\sigma(w_{s})/|W|.

Based on Corollary 2, we have

Pr[Np(ws)(1ϵ)σ(ws)]\displaystyle Pr[N_{p}(w_{s})\leq(1-\epsilon)\cdot\sigma(w_{s})]
=Pr[fR(ws)αϵα]\displaystyle=Pr[f_{R}(w_{s})-\alpha\leq-\epsilon\cdot\alpha]
=Pr[NfR(ws)NαϵNα]\displaystyle=Pr[N\cdot f_{R}(w_{s})-N\cdot\alpha\leq-\epsilon\cdot N\cdot\alpha]
=Pr[jNvjNαϵNα]\displaystyle=Pr\left[\sum\nolimits_{j}^{N}v_{j}-N\cdot\alpha\leq-\epsilon\cdot N\cdot\alpha\right]
exp(ϵ2Nα/2)\displaystyle\leq\exp(-\epsilon^{2}\cdot N\cdot\alpha/2)
exp(ϵ2Nα/2)=λ\displaystyle\leq\exp(-\epsilon^{2}\cdot N^{\prime}\cdot\alpha/2)=\lambda

The proof of Lemma 5 is similar to that of Lemma 4 when setting α=𝔼[fR(wsτ)]\alpha=\mathbb{E}[f_{R}(w_{s}^{\tau})].

Proof of Lemma 6. Given any worker wsw_{s}, let α=𝔼[fR(ws)]\alpha=\mathbb{E}[f_{R}(w_{s})]. The following equation can be derived.

α=𝔼[fR(ws)]𝔼[fR(wsτ)]=σ(wsτ)/|W|<ki/|W|\footnotesize\alpha=\mathbb{E}[f_{R}(w_{s})]\leq\mathbb{E}[f_{R}(w_{s}^{\tau})]=\sigma(w_{s}^{\tau})/|W|<k_{i}/|W|

Let β=(1+ϵ)ki|W|α1\beta=\frac{(1+\epsilon^{*})\cdot k_{i}}{|W|\cdot\alpha}-1. By Equation III-E, we can get β>ϵki|W|α>ϵ\beta>\frac{\epsilon^{*}\cdot k_{i}}{|W|\cdot\alpha}>\epsilon^{*}. Based on Corollary 1, we have

Pr[Np(ws)γ]\displaystyle Pr\left[N_{p}(w_{s})\geq\gamma\right]
=Pr[fR(ws)α(γ|W|α1)α]\displaystyle=Pr\left[f_{R}(w_{s})-\alpha\geq\left(\frac{\gamma}{|W|\cdot\alpha}-1\right)\cdot\alpha\right]
=Pr[NfR(ws)Nα(γ|W|α1)Nα]\displaystyle=Pr\left[N\cdot f_{R}(w_{s})-N\cdot\alpha\geq\left(\frac{\gamma}{|W|\cdot\alpha}-1\right)\cdot N\cdot\alpha\right]
exp(β2Nα2+23β)exp(β2NR(ki)α2+23β)\displaystyle\leq\exp\left(-\frac{\beta^{2}\cdot N\cdot\alpha}{2+\frac{2}{3}\beta}\right)\leq\exp\left(-\frac{\beta^{2}\cdot N_{R}(k_{i})\cdot\alpha}{2+\frac{2}{3}\beta}\right)
<exp(12/ϵ+2/3(2+2ϵ/3)ln(|W|/λ)ϵ)\displaystyle<\exp\left(-\frac{1}{2/\epsilon^{*}+2/3}\cdot\frac{(2+2\epsilon^{*}/3)\cdot\ln\left(|W|/\lambda^{*}\right)}{\epsilon^{*}}\right)
=λ/|W|\displaystyle=\lambda^{*}/|W|

According to the union bound and Np(ws)Np𝑜𝑝𝑡N_{p}(w_{s})\leq N_{p}^{\mathit{opt}}, we have Np𝑜𝑝𝑡<γN_{p}^{\mathit{opt}}<\gamma with at least 1λ1-\lambda^{*} probability.

IV Influence-aware Task Assignment

We propose three algorithms, including basic, Entropy-based, and Distance-based Influence-aware Assignment, abbreviated IA, EIA, and DIA, respectively, that solve the ITA problem.

IV-A Influence-aware Assignment

Taking worker-task influence as the priority of task assignment, we propose a basic Influence-aware Assignment (IA) algorithm to solve the ITA problem by transforming it to a Minimum Cost Maximum Flow (MCMF) [11] problem.

To adapt MCMF to the ITA problem, we first construct a task assignment graph based on the available workers and tasks. Specifically, given a set of workers W={w1,w2,}W=\{w_{1},w_{2},\ldots\}, and a set of tasks S={s1,s2,}S=\{s_{1},s_{2},\ldots\} at time tt, we construct a graph G=(N,E)G=(N,E), where NN and EE denote sets of nodes and edges, respectively. Let |N|=|W|+|S|+2|N|=|W|+|S|+2 and |E|=|W|+|S|+m|E|=|W|+|S|+m, where mm is the number of available assignments for all workers. Since tasks expire at their deadlines and workers only accept tasks in their reachable range, the available assignments for worker ww, denoted as w.Aw.A, should satisfy the following conditions:

i) task ss is located in the reachable circular range of worker ww, i.e., d(w.l,s.l)w.rd(w.l,s.l)\leq w.r.

ii) worker ww has enough time to reach the location of ss before it expires, i.e., t+t(w.l,s.l)s.p+s.φt+t(w.l,s.l)\leq s.p+s.\varphi.

We use d(w.l,s.l)d(w.l,s.l) to denote the Euclidean distance between w.lw.l and s.ls.l, and use t(w.l,s.l)t(w.l,s.l) to denote the travel time from w.lw.l to s.ls.l. For the sake of simplicity, we assume all the workers share the same travel speed, meaning that the travel time and distance are equivalent. However, the proposed algorithms can also address the cases where workers are moving at different speeds. Let |w.A||w.A| be the number of available assignments for worker ww, and thus mm can be derived by summing |w.A||w.A| for all workers: m=wW|w.A|m=\sum_{w\in W}|w.A|.

In graph GG, nodes nin_{i} and n|W|+jn_{|W|+j} correspond to a worker wiw_{i} and a task sjs_{j}, respectively. Moreover, we add two new nodes (denoted as n0n_{0} and n|W|+|S|+1n_{|W|+|S|+1}) as the source (NsN_{s}) and destination (NdN_{d}), respectively. An example graph GG for four workers and four tasks at the same time is illustrated in Figure 4. The graph is generated by following steps:

i) NsN_{s} connects all worker nodes, and the capacities of the corresponding edges are set to 1, i.e., c=1c=1, since each worker can perform only one task at a time. The costs of these edges are set to 0.

ii) Each task node connects with NdN_{d}, and the capacities of the corresponding edges are set to 1, indicating that each task can be assigned to at most 1 worker. The costs of these edges are set to 0.

iii) If the assignment (sj,wi)(s_{j},w_{i}) is available, i.e., (sj,wi)w.A(s_{j},w_{i})\in w.A, we add an edge from worker node nin_{i} to task node n|W|+jn_{|W|+j}. The capacities of the corresponding edges are set to 1, and the cost (denoted as w(ni,n|W|+j)w(n_{i},n_{|W|+j})) is the ratio between 1 and the worker-task influence, 𝑖𝑓(wi,sj)\mathit{if}(w_{i},s_{j}), of wiw_{i} and sjs_{j}, i.e., w(ni,n|W|+j)=1𝑖𝑓(wi,sj)+1w(n_{i},n_{|W|+j})=\frac{1}{\mathit{if}(w_{i},s_{j})+1}.

Refer to caption

Figure 4: Task Assignment Graph

Then the task assignment problem is converted into an MCMF problem in the directed graph GG from NsN_{s} to NdN_{d}, which is to achieve the maximum flow (i.e., maximizing the task assignments) while minimize the cost (i.e., maximizing worker-task influence). The Ford-Fulkerson algorithm [35] is employed to compute the maximum flow of the graph, and then linear programming is used to minimize the cost of the flow [11].

IV-B Entropy-based Influence-aware Assignment (EIA)

In SC, each task has a location. If many workers are close to a task, i.e., the relative proportion of workers close to the task is high, the task is more likely to be completed. Considering that location entropy [36, 11] is an efficient metric to measure the total number of workers in the location of a task as well as the relative proportion of their visits to that location, we use it to measure the relative proportion of workers in the location of a specific task. Lower location entropy indicates that the distribution of the visits to that task is restricted to only a few workers. To maximize the total number of task assignments, a task located in a region with smaller location entropy should be given higher priority when making assignments. Let 𝑁𝑢𝑚w\mathit{Num}_{w} denote the historical number of visits of worker ww to the location of task ss, and let 𝑁𝑢𝑚s\mathit{Num}_{s} denote the total number of visits of all workers to the location of task ss. Then the location entropy s.es.e of task ss, is computed as follows:

s.e=wWsPs(w)lnPs(w),\footnotesize s.e=-\sum\nolimits_{w\in W_{s}}P_{s}(w)\cdot\ln P_{s}(w),

where WsW_{s} is a set of workers that have performed task ss historically, and Ps(w)=𝑁𝑢𝑚w/𝑁𝑢𝑚sP_{s}(w)=\mathit{Num}_{w}/\mathit{Num}_{s}.

Considering worker-task influence and location entropy, EIA adapts IA by setting the cost w(ni,n|W|+j)w(n_{i},n_{|W|+j}) of each edge that connects wiw_{i} and sjs_{j} to (s.e+1)/(𝑖𝑓(wi,sj)+1)(s.e+1)/(\mathit{if}(w_{i},s_{j})+1), where 𝑖𝑓(wi,sj)\mathit{if}(w_{i},s_{j}) is the worker-task influence of worker wiw_{i} and task sjs_{j}.

IV-C Distance-based Influence-aware Assignment (DIA)

The IA algorithm fails to consider travel costs between the locations of workers and tasks. Workers are more likely to perform nearby tasks [19, 1], and travel cost is a critical factor when workers choose which tasks to perform. We compute the travel cost between a worker wiw_{i} and a task sjs_{j}, denoted as d(wi.l,sj.l)d(w_{i}.l,s_{j}.l), using Euclidean distance. Workers who are closer to tasks will be given higher priority to perform them. To achieve this, we propose a Distance-based Influence-aware Assignment (DIA) algorithm that uses travel costs to discount worker-task influence. Specifically, DIA modifies IA by setting the cost w(ni,n|W|+j)w(n_{i},n_{|W|+j}) of each edge that connects wiw_{i} and sjs_{j} to 1/(F(wi.l,sj.l)𝑖𝑓(wi,sj)+1)1/(F(w_{i}.l,s_{j}.l)\cdot\mathit{if}(w_{i},s_{j})+1), where F(wi.l,sj.l)=1min(1,d(wi.l,sj.l)/wi.r)F(w_{i}.l,s_{j}.l)=1-\min(1,d(w_{i}.l,s_{j}.l)/w_{i}.r).

V Experimental Evaluation

V-A Experimental Setup

Due to the lack of benchmarks for spatial crowsourcing task assignment algorithms, two check-in datasets consisting of social networks of workers, and workers’ check-ins from Brightkite (BK) [37] and FourSquare (FS) [38] are used to simulate a spatial crowdsourcing scenario. This is common practice when evaluating SC platforms [39, 3, 40, 13]. Since BK does not contain category information of venues, we exact categories of the venues with the aid of the FourSquare API111https://developer.foursquare.com/docs/. BK has 58,228 users, 214,078 social connections, and 4,491,143 check-ins from April 2008 to October 2010. FS has 11,326 users, 47,164 social connections, and 1,385,223 check-ins from January to December 2011.

We assume that all users are workers since users who check in at different spots are good candidates to perform nearby spatial tasks, and we assume that their locations are those of the most recent check-ins. Moreover, we set the time granularity to one day, during which the available tasks and workers are entered into our framework. We also assume that users who check in at a time instance are available workers for that time instance, and we assume that a worker is online until the worker is assigned a task. For each check-in venue, we use its location and the earliest check-in time of the day as the location and publication time of a task. Further, the categories of check-in locations are regarded as task categories. We set the number of topics used to extract worker-task affinity to 50, i.e., |𝑇𝑜𝑝|=50\mathit{|Top|}=50. The informed probability of each social network edge, ee, is set to 1/𝑖𝑑e1/\mathit{id}_{e} [29, 41, 31], i.e., Pj=1/𝑖𝑑eP_{j}=1/\mathit{id}_{e}, where 𝑖𝑑e\mathit{id}_{e} denotes the number of edges with the same end point with ee. The parameters ϵ\epsilon and oo in the Random reverse reachable-based Propagation Optimization approach are set to 0.1 and 1, respectively. Travel costs are calculated using Euclidean distance, and the speeds of workers is set to 5 km/h. The default values of all parameters used in the experiments are summarized in Table II. In task assignment experiments, we run the algorithms over 4 days of a month on BK and FS, and we report average results. All experiments are run on a Linux (Ubuntu 16.04) machine with Intel(R) Xeon(R) E5-2650 v4 2.20GHz processor and 256G memory.

TABLE II: Parameter Settings
Parameter Default value
Number of tasks |S||S| 1500
Number of workers |W||W| 1200
Valid time of tasks φ\varphi 5 h
Workers’ reachable radius rr 25 km

V-B Experimental Results

V-B1 Influence Modeling Performance

We first evaluate the performance of worker-task affinity, worker willingness, and worker propagation and their impact on worker-task influence. We consider the IA algorithm and three variants of it to study the contribution to worker-task influence of the three aspects. The methods are as follows:

i) IA: Our basic Influence-aware Assignment algorithm, which considers worker-task influence and aims to maximize total task assignment and worker-task influence.

ii) IA-WP: A variant of IA that considers worker willingness and worker propagation.

iii) IA-AP: A variant of IA that considers worker-task affinity and worker propagation.

iv) IA-AW: A variant of IA that considers worker-task affinity and worker willingness.

Since we aim to maximize the influence of tasks, we propose Average Influence, 𝐴𝐼\mathit{AI}, to evaluate the performance of each algorithm, which is calculated as follows:

𝐴𝐼=(s,w)A𝑖𝑓(w,s)|A|,\footnotesize\mathit{AI}=\frac{\sum_{(s,w)\in A}\mathit{if}(w,s)}{\mathit{|A|}}, (6)

where 𝑖𝑓(w,s)\mathit{if}(w,s) is the worker-task influence of worker ww and task ss, and |A|\mathit{|A|} is the number of assignments.

Effect of |S||S|: We first study the effect of |S||S| on AI. As illustrated in Figure 5, IA achieves the largest AI, followed by IA-AP, for any |S||S|. The reason is that IA considers worker-task affinity, worker willingness and worker propagation, while none of the variants consider all aspects. IA-AP performs better than IA-WP and IA-AW. This may be due to the fact that the average probability of workers visiting locations of tasks is small, which means that the weight of worker willingness on computing task influence is smaller than that of worker-task affinity and worker propagation. Another observation is that AI of IA is highest when |S|=500|S|=500. The reason is that the number of workers who can generate larger worker-task influence is small and that most of them are selected when |S|=500|S|=500.

Refer to caption
(a) Average Influence on BK
Refer to caption
(b) Average Influence on FS
Figure 5: Effect of |S||S|

Effect of |W||W|: Next we study the effect of |W||W|. From Figure 6, we can see that the AI of IA-WP is lowest among these algorithms in most cases, which means the weight of worker-task affinity plays a more important role on modeling worker-task influence than worker willingness and worker propagation.

Refer to caption
(a) Average Influence on BK
Refer to caption
(b) Average Influence on FS
Figure 6: Effect of |W||W|

Effect of φ\varphi: Next, we study the effect of φ\varphi. As can be seen in Figure 7, the AI of all methods changes randomly. This may be due to the fact that the number of the available tasks increases when φ\varphi grows, which means workers would be assigned tasks with higher AI on any φ\varphi.

Refer to caption
(a) Average Influence on BK
Refer to caption
(b) Average Influence on FS
Figure 7: Effect of φ\varphi

Effect of rr: As expected, the AI of all approaches changes randomly with the change of rr (see Figure 8), since the number of available tasks grows when rr increases. AI of IA is highest when r=25r=25. The reason is that a larger rr means that workers have more chances to perform different tasks, which means the assignment is with higher probability to have larger AI.

Refer to caption
(a) Average Influence on BK
Refer to caption
(b) Average Influence on FS
Figure 8: Effect of rr

V-B2 Performance of Influence-aware Task Assignment

Next, we evaluate the different task assignment algorithms.

Refer to caption
(a) CPU Time
Refer to caption
(b) Number of Assigned Tasks
Refer to caption
(c) Average Influence
Refer to caption
(d) Average Propagation
Refer to caption
(e) Travel Cost
Figure 9: Effect of |S||S| on BK
Refer to caption
(a) CPU Time
Refer to caption
(b) Number of Assigned Tasks
Refer to caption
(c) Average Influence
Refer to caption
(d) Average Propagation
Refer to caption
(e) Travel Cost
Figure 10: Effect of |S||S| on FS

i) MTA: The Maximum Task Assignment algorithm [11] that maximizes the number of assigned tasks by computing the maximum flow of the task assignment graph.

ii) IA: Our basic Influence-aware Assignment algorithm.

iii) EIA: Our Entropy-based IA algorithm.

iv) DIA: Our Distance-based IA algorithm.

v) MI: The Maximum Influence algorithm that is divided into two steps: 1) Select multiple workers for each task based on the spatio-temporal constraints; 2) assign a task to each worker to maximize the total worker-task influence.

When a worker knows a task, the worker will propagate the information of the task to other workers in the social network. More workers knowing the information of the task leads to larger worker-task influence. Thus we introduce a metric, Average Propagation, 𝐴𝑃\mathit{AP}, to evaluate the performance of the task assignment algorithms.

𝐴𝑃=(si,wi)AwjW{wi}P𝑝𝑟𝑜(wi,wj)|A|,\footnotesize\mathit{AP}=\frac{\sum_{(s_{i},w_{i})\in A}\sum_{w_{j}\in W\setminus\{w_{i}\}}P_{\mathit{pro}}(w_{i},w_{j})}{\mathit{|A|}}, (7)

where WW is the set of all workers and P𝑝𝑟𝑜(wi,wj)P_{\mathit{pro}}(w_{i},w_{j}) is the probability that worker wjw_{j} knows task sis_{i} from worker wiw_{i}.

Four additional metrics are also used to compare the algorithms: 1) CPU time: the CPU time costs for computing a task assignment during a time instance; 2) the total number of assigned tasks; 3) 𝐴𝐼\mathit{AI}; and 4) travel cost: the average travel costs for workers performing tasks.

Effect of |S||S|. We first study the effect of the number of tasks. We generate five datasets containing 500 to 2,500 tasks by random selection from the original dataset. As shown in Figures 9(a) and 10(a), the CPU costs of all methods exhibit a similar increasing trend when |S||S| grows. The reason is that a larger |S||S| means that the task assignment graph becomes larger, which results in higher CPU time to compute task assignments. We can see that the CPU time is highest for EIA, followed by IA, DIA, MI and MTA. However, the number of tasks assigned by EIA is larger than those of the others (see Figures 9(b) and 10(b)), which demonstrates the superiority of the location entropy strategy. MI has the smallest number of assigned tasks while it has the largest Average Influence, 𝐴𝐼\mathit{AI}, (see Figures 9(c) and 10(c)). This is due to the fact that MI aims to maximize the total worker-task influence and ignores to maximize the total number of assigned tasks, which increases the value of 𝐴𝐼\mathit{AI}. The 𝐴𝐼\mathit{AI} of IA is larger than that of DIA, EIA, and MTA. The reason is that EIA and DIA adopt the location entropy and travel cost strategies, respectively, which reduces the effect of worker-task influence. DIA takes into account the travel cost of workers with the result that the worker willingness (see Equation 2) of DIA is larger than that of EIA. Thus, the 𝐴𝐼\mathit{AI} of DIA is larger than that of EIA for all values of |S||S|. As expected, the Average Propagation 𝐴𝑃\mathit{AP} of MI, IA, EIA, and DIA is larger than that of MTA (see Figures 9(d) and 10(d)). The reason is that worker propagation is considered in MI, IA, EIA, and DIA, while being ignored in MTA. Since workers who can generate larger worker propagation have priority to perform tasks, we see that with the increase of |S||S|, workers with smaller worker propagation have more chances to perform tasks. Moreover, DIA yields the smallest average travel costs, as shown in Figures 9(e) and 10(e). This is due to the fact that DIA takes into account the travel cost. Workers who are closer to tasks will be given higher priority to perform them. The average travel costs of all algorithms decrease with the increase of |S||S|, since the probability of assigned tasks located near workers increases.

Refer to caption
(a) CPU Time
Refer to caption
(b) Number of Assigned Tasks
Refer to caption
(c) Average Influence
Refer to caption
(d) Average Propagation
Refer to caption
(e) Travel Cost
Figure 11: Effect of |W||W| on BK
Refer to caption
(a) CPU Time
Refer to caption
(b) Number of Assigned Tasks
Refer to caption
(c) Average Influence
Refer to caption
(d) Average Propagation
Refer to caption
(e) Travel Cost
Figure 12: Effect of |W||W| on FS
Refer to caption
(a) CPU Time
Refer to caption
(b) Number of Assigned Tasks
Refer to caption
(c) Average Influence
Refer to caption
(d) Average Propagation
Refer to caption
(e) Travel Cost
Figure 13: Effect of φ\varphi on BK
Refer to caption
(a) CPU Time
Refer to caption
(b) Number of Assigned Tasks
Refer to caption
(c) Average Influence
Refer to caption
(d) Average Propagation
Refer to caption
(e) Travel Cost
Figure 14: Effect of φ\varphi on FS

Effect of |W||W|: Next, we study the effect of |W||W| by varying it from 400 to 2,000. Figures 11(a) and 12(a) show that the CPU time increases when |W||W| grows. The reason is that more workers tend to have more available task assignments, which leads to more edges in the task assignment graph. Since more workers can take part in task assignment, more tasks can be assigned, so the number of assigned tasks grows with the increase in the number of workers (see Figures 11(b) and 12(b)). As shown in Figures 11(c) and 12(c), the 𝐴𝐼\mathit{AI} of MI, IA, EIA, and DIA are larger than that of MTA. Figures 11(d) and 12(d) show that the 𝐴𝑃\mathit{AP} of all methods changes randomly. The reason may be that workers are selected at random from the original datasets, which means that workers who can generate larger 𝐴𝑃\mathit{AP} have probability of being selected for |W||W|. Moreover, the average travel cost of DIA is the smallest, and that of MTA is the highest (see Figures 11(e) and 12(e)). The reason is that DIA takes workers’ travel costs into account, while MTA disregards any location information.

Refer to caption
(a) CPU Time
Refer to caption
(b) Number of Assigned Tasks
Refer to caption
(c) Average Influence
Refer to caption
(d) Average Propagation
Refer to caption
(e) Travel Cost
Figure 15: Effect of rr on BK
Refer to caption
(a) CPU Time
Refer to caption
(b) Number of Assigned Tasks
Refer to caption
(c) Average Influence
Refer to caption
(d) Average Propagation
Refer to caption
(e) Travel Cost
Figure 16: Effect of rr on FS

Effect of φ\varphi: As expected, the CPU costs of all methods increase when φ\varphi grows (see Figures 13(a) and 14(a)). This occurs because workers can reach more tasks when φ\varphi grows, which means that the number of available task assignments increases, i.e., more edges exist in the task assignment graph. As shown in Figures 13(b) and 14(b), the number of assigned tasks of all methods grows with growing φ\varphi. The reason is that the task assignment graph becomes larger with larger φ\varphi, which means that the probability of workers being assigned a task increases. Figures 13(c)13(d)14(c), and 14(d) show that the 𝐴𝐼\mathit{AI} and 𝐴𝑃\mathit{AP} of MI, IA, EIA, and DIA are larger than for MTA. The average travel costs of MTA are larger than those of other algorithms (see Figures 13(e) and 14(e)). Moreover, the average travel costs of all methods increase when φ\varphi grows (see Figures 13(e) and 14(e)). The reason is that with the increase of φ\varphi, the probability of workers performing tasks with larger travel costs increases, which means that some workers are assigned tasks with larger travel costs. The average travel costs of EIA are larger than those of IA, DIA and MI since tasks with lower location entropy have higher priority to be assigned when applying EIA, which indicates workers travel longer to reach tasks.

Effect of rr: We proceed to consider the effect of rr by varying it from 5 to 25 km. Figures 15(a)15(b)16(a) and 16(b) show that the CPU time and the number of assigned tasks of all methods exhibit a similar increasing trend when rr grows. The reason is that with the increase of rr, more tasks are available in each worker’s reachable range, which means that each worker has higher probability to be assigned a task. It can also be seen that the gap in the number of assigned tasks between EIA and the other approaches increases. The reason is that when rr grows, the number of tasks that are far from workers increases, and the probability of workers accept tasks that are far from them is small. When applying EIA, the tasks with fewer workers nearby have higher priority of being assigned, increasing the probability that workers accept tasks that far from them, which leads to more assignments. As illustrated in Figures 15(c)15(d)16(c), and 16(d), the 𝐴𝐼\mathit{AI} and 𝐴𝑃\mathit{AP} of MTA are lower than for the other approaches, which demonstrates the superiority of the influence-aware assignment strategy. Since more tasks are assigned and workers can reach tasks with larger travel costs when rr grows, the average travel costs of all methods increase (cf. Figures 15(e) and 16(e)).

According to the above analysis, the time cost of MTA is the lowest, while the number of assigned tasks, Average Influence (𝐴𝐼)(\mathit{AI}) and Average Propagation (𝐴𝑃)(\mathit{AP}) of MTA are the smallest. The 𝐴𝐼\mathit{AI} of IA is larger than that of MTA, DIA and EIA because these algorithms adopt different strategies to improve the number of assigned tasks, which reduces the effect of worker-task influence. EIA is more time-consuming, but also achieved larger numbers of assigned tasks than the other algorithms. The travel cost of DIA is the smallest since it takes travel costs into account when assigning tasks. The 𝐴𝐼\mathit{AI} and 𝐴𝑃\mathit{AP} of MI are the largest because MI aims to maximize the total worker-task influence.

VI Related Work

Spatial Crowdsourcing (SC) has been the subject of a range of studies [42, 5, 44, 45, 8, 46, 9, 47, 48, 49]. One of the core problems in SC is task assignment. Kazemi et al. [11] consider two task publication modes, namely Worker Selected Tasks (WST) and Server Assigned Tasks (SAT). In WST mode, workers can choose nearby spatial tasks without the need to coordinate with the SC-server. In SAT mode, the server assigns tasks to workers with the aim of maximizing the number of assigned tasks [3, 50, 51, 1] or maximizing the number of performed tasks for a worker with optimal schedule [13]. Zeng et al. [52] study a latency-oriented task completion problem that addresses the trade-off between quality and latency for task assignment. Cheng et al. [53] focus on cooperation-aware spatial crowdsourcing, where more than one worker is required to complete a task. In contrast to these studies, we study a novel task assignment problem based on worker-task influence.

Next, quality assurance is a core challenge in spatial task assignment. Workers tend to complete tasks with good quality if a quality strategy exists. Zhao et al. [51] study preference-aware task assignment, which considers temporal preferences of workers. Zhao et al. [54] propose a preference-aware task assignment for on-demand taxi dispatching that aims to maximize the expected total profits. However, these studies simply infer workers’ preferences from historical task-performing records, and they ignore workers’ social impact.

Some recent studies try to improve task assignment based on social networks. Li et al. [14] focus on group task assignment, which employs social features to learn social impact-based preferences of different worker groups. Wang et al. [55] propose two algorithms, Basic-Selector and Fast-Selector, to select a subset of workers to maximize the temporal-spatial coverage. However, these studies ignore the interactions among all workers in social networks and workers’ long-term task performing patterns.

VII Conclusion

In this paper, we take an important step towards effective task assignment in spatial crowdsourcing that takes into account worker-task influence. Unlike most existing studies that only consider real-time worker and task locations, we further consider social networks to capture the interactions among workers, and we employ historical task-performing records to extract long-term task performing patterns of workers. We propose three task assignment algorithms that maximize the number of assigned tasks and worker-task influence. To the best of our knowledge, this is the first study in spatial crowdsourcing that considers worker-task influence in task assignment. An extensive empirical study based on real-world data demonstrates that the proposed methods can significantly improve the effectiveness of task assignment.

Acknowledgment

This work is partially supported by NSFC (No. 61972069, 61836007 and 61832017), and Shenzhen Municipal Science and Technology R&D Funding Basic Research Program (JCYJ20210324133607021).

References

  • [1] Y. Zhao, K. Zheng, H. Yin, G. Liu, J. Fang, and X. Zhou, “Preference-aware task assignment in spatial crowdsourcing: from individuals to groups,” TKDE, 2020.
  • [2] P. Cheng, X. Lian, L. Chen, J. Han, and J. Zhao, “Task assignment on multi-skill oriented spatial crowdsourcing,” TKDE, vol. 28, no. 8, pp. 2201–2215, 2016.
  • [3] P. Cheng, X. Lian, L. Chen, and C. Shahabi, “Prediction-based task assignment in spatial crowdsourcing,” in ICDE, 2017, pp. 997–1008.
  • [4] T. Song, Y. Tong, L. Wang, J. She, B. Yao, L. Chen, and K. Xu, “Trichromatic online matching in real-time spatial crowdsourcing,” in ICDE, 2017, pp. 1009–1020.
  • [5] Y. Tong, L. Wang, Z. Zhou, L. Chen, B. Du, and J. Ye, “Dynamic pricing in spatial crowdsourcing: A matching-based approach,” in SIGMOD, 2018, pp. 773–788.
  • [6] Y. Tong, L. Wang, Z. Zimu, B. Ding, L. Chen, J. Ye, and K. Xu, “Flexible online task assignment in real-time spatial data,” PVLDB, vol. 10, no. 11, pp. 1334–1345, 2017.
  • [7] Y. Tong, Y. Zeng, Z. Zhou, L. Chen, J. Ye, and K. Xu, “A unified approach to route planning for shared mobility,” PVLDB, vol. 11, no. 11, p. 1633, 2018.
  • [8] J. Xia, Y. Zhao, G. Liu, J. Xu, M. Zhang, and K. Zheng, “Profit-driven task assignment in spatial crowdsourcing,” in IJCAI, 2019, pp. 1914–1920.
  • [9] Y. Zhao, Y. Li, Y. Wang, H. Su, and K. Zheng, “Destination-aware task assignment in spatial crowdsourcing,” in CIKM, 2017, pp. 297–306.
  • [10] G. Ye, Y. Zhao, X. Chen, and K. Zheng, “Task allocation with geographic partition in spatial crowdsourcing,” in CIKM, 2021, pp. 2404–2413.
  • [11] L. Kazemi and C. Shahabi, “Geocrowd: enabling query answering with spatial crowdsourcing,” in SIGSPATIAL, 2012, pp. 189–198.
  • [12] P. Cheng, X. Lian, Z. Chen, R. Fu, L. Chen, J. Han, and J. Zhao, “Reliable diversity-based spatial crowdsourcing by moving workers,” PVLDB, vol. 8, no. 10, pp. 1022–1033, 2015.
  • [13] D. Deng, C. Shahabi, and U. Demiryurek, “Maximizing the number of worker’s self-selected tasks in spatial crowdsourcing,” in SIGSPATIAL, 2013, pp. 324–333.
  • [14] X. Li, Y. Zhao, X. Zhou, and K. Zheng, “Consensus-based group task assignment with social impact in spatial crowdsourcing,” Data Science and Engineering, vol. 5, no. 4, pp. 375–390, 2020.
  • [15] X. Li, Y. Zhao, J. Guo, and K. Zheng, “Group task assignment with social impact-based preference in spatial crowdsourcing,” in DASFAA, 2020, pp. 677–693.
  • [16] J. Tang, X. Tang, X. Xiao, and J. Yuan, “Online processing algorithms for influence maximization,” in SIGMOD, 2018, pp. 991–1005.
  • [17] X. Chen, Y. Zhao, G. Liu, R. Sun, X. Zhou, and K. Zheng, “Efficient similarity-aware influence maximization in geo-social network,” TKDE, 2020.
  • [18] M.-C. Yuen, I. King, and K.-S. Leung, “Task recommendation in crowdsourcing systems,” in Proceedings of the first international workshop on crowdsourcing and data mining, 2012, pp. 22–26.
  • [19] H. To, G. Ghinita, and C. Shahabi, “A framework for protecting worker location privacy in spatial crowdsourcing,” PVLDB, vol. 7, no. 10, pp. 919–930, 2014.
  • [20] V. V. Vazirani, Approximation algorithms, 2013.
  • [21] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of machine Learning research, vol. 3, pp. 993–1022, 2003.
  • [22] W. Gong, B. Zhang, and C. Li, “Location-based online task assignment and path planning for mobile crowdsensing,” IEEE Transactions on Vehicular Technology, vol. 68, no. 2, pp. 1772–1783, 2018.
  • [23] Q. Tao, Y. Tong, Z. Zhou, Y. Shi, L. Chen, and K. Xu, “Differentially private online task assignment in spatial crowdsourcing: A tree-based approach,” in ICDE, 2020, pp. 517–528.
  • [24] J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques, 2011.
  • [25] W.-Y. Zhu, W.-C. Peng, and L.-J. Chen, “Exploiting mobility for location promotion in location-based social networks,” in DSAA, 2014, pp. 76–82.
  • [26] W.-Y. Zhu, W.-C. Peng, L.-J. Chen, K. Zheng, and X. Zhou, “Modeling user mobility for location promotion in location-based social networks,” in SIGKDD, 2015, pp. 1573–1582.
  • [27] R. Singhai, S. D. Joshi, and R. K. Bhatt, “A novel discrete distribution and process to model self-similar traffic,” in ICT, 2007, pp. 167–172.
  • [28] D. Kempe, J. Kleinberg, and É. Tardos, “Maximizing the spread of influence through a social network,” in SIGKDD, 2003, pp. 137–146.
  • [29] W. Chen, C. Wang, and Y. Wang, “Scalable influence maximization for prevalent viral marketing in large-scale social networks,” in SIGKDD, 2010, pp. 1029–1038.
  • [30] C. Borgs, M. Brautbar, J. Chayes, and B. Lucier, “Maximizing social influence in nearly optimal time,” in SODA, 2014, pp. 946–957.
  • [31] Y. Tang, Y. Shi, and X. Xiao, “Influence maximization in near-linear time: A martingale approach,” in SIGMOD, 2015, pp. 1539–1554.
  • [32] A.-A. Stoica and A. Chaintreau, “Fairness in social influence maximization,” in WWW, 2019, pp. 569–574.
  • [33] X. Chen, L. Deng, Y. Zhao, X. Zhou, and K. Zheng, “Community-based influence maximization in location-based social network,” WWWJ, vol. 24, no. 6, pp. 1903–1928, 2021.
  • [34] F. Chung and L. Lu, “Concentration inequalities and martingale inequalities: a survey,” Internet Mathematics, vol. 3, no. 1, pp. 79–127, 2006.
  • [35] L. R. Ford and D. R. Fulkerson, “Maximal flow through a network,” Canadian journal of Mathematics, vol. 8, pp. 399–404, 2009.
  • [36] J. Cranshaw, E. Toch, J. Hong, A. Kittur, and N. Sadeh, “Bridging the gap between physical location and online social networks,” in UbiComp, 2010, pp. 119–128.
  • [37] E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: user movement in location-based social networks,” in SIGKDD, 2011, pp. 1082–1090.
  • [38] A. Likhyani, S. Bedathur, and P. Deepak, “Locate: Influence quantification for location promotion in location-based social networks,” in IJCAI, 2017, pp. 2259–2265.
  • [39] Y. Zhao, K. Zheng, Y. Cui, H. Su, F. Zhu, and X. Zhou, “Predictive task assignment in spatial crowdsourcing: a data-driven approach,” in ICDE, 2020, pp. 13–24.
  • [40] Z. Wang, Y. Zhao, X. Chen, and K. Zheng, “Task assignment with worker churn prediction in spatial crowdsourcing,” in CIKM, 2021, pp. 2070–2079.
  • [41] K. Jung, W. Heo, and W. Chen, “Irie: Scalable and robust influence maximization in social networks,” in ICDM, 2012, pp. 918–923.
  • [42] Y. Tong, Z. Zhou, Y. Zeng, L. Chen, and C. Shahabi, “Spatial crowdsourcing: a survey,” VLDBJ, vol. 29, no. 1, pp. 217–250, 2020.
  • [43] D. Meng, Y. Jia, J. Du, and F. Yu, “Tracking algorithms for multiagent systems,” IEEE Transactions on neural networks and learning systems, vol. 24, no. 10, pp. 1660–1676, 2013.
  • [44] Y. Zhao, K. Zheng, J. Guo, B. Yang, T. B. Pedersen, and C. S. Jensen, “Fairness-aware task assignment in spatial crowdsourcing: Game-theoretic approaches,” in ICDE, 2021, pp. 265–276.
  • [45] Y. Zhao, J. Guo, X. Chen, J. Hao, X. Zhou, and K. Zheng, “Coalition-based task assignment in spatial crowdsourcing,” in ICDE, 2021, pp. 241–252.
  • [46] Y. Zhao, K. Zheng, Y. Li, H. Su, J. Liu, and X. Zhou, “Destination-aware task assignment in spatial crowdsourcing: A worker decomposition approach,” TKDE, vol. 32, no. 12, pp. 2336–2350, 2019.
  • [47] Z. Chen, P. Cheng, L. Chen, X. Lin, and C. Shahabi, “Fair task assignment in spatial crowdsourcing,” PVLDB, vol. 13, no. 12, pp. 2479–2492, 2020.
  • [48] Y. Li, Y. Zhao, and K. Zheng, “Preference-aware group task assignment in spatial crowdsourcing: A mutual information-based approach,” in ICDM, 2021, pp. 350–359.
  • [49] Y. Zhao, X. Chen, L. Deng, T. Kieu, C. Guo, B. Yang, K. Zheng, and C. S. Jensen, “Outlier detection for streaming task assignment in crowdsourcing,” in WWW, 2022.
  • [50] Y. Tong, L. Chen, Z. Zhou, H. V. Jagadish, L. Shou, and W. Lv, “Slade: A smart large-scale task decomposer in crowdsourcing,” TKDE, vol. 30, no. 8, pp. 1588–1601, 2018.
  • [51] Y. Zhao, J. Xia, G. Liu, H. Su, D. Lian, S. Shang, and K. Zheng, “Preference-aware task assignment in spatial crowdsourcing,” in AAAI, vol. 33, no. 01, 2019, pp. 2629–2636.
  • [52] Y. Zeng, Y. Tong, L. Chen, and Z. Zhou, “Latency-oriented task completion via spatial crowdsourcing,” in ICDE, 2018, pp. 317–328.
  • [53] P. Cheng, L. Chen, and J. Ye, “Cooperation-aware task assignment in spatial crowdsourcing,” in ICDE, 2019, pp. 1442–1453.
  • [54] B. Zhao, P. Xu, Y. Shi, Y. Tong, Z. Zhou, and Y. Zeng, “Preference-aware task assignment in on-demand taxi dispatching: An online stable matching approach,” in AAAI, vol. 33, no. 01, 2019, pp. 2245–2252.
  • [55] J. Wang, F. Wang, Y. Wang, D. Zhang, L. Wang, and Z. Qiu, “Social-network-assisted worker recruitment in mobile crowd sensing,” TMC, vol. 18, no. 7, pp. 1661–1673, 2018.